#OpenResty

@mastodon.bentasker.co.uk

3 days ago

Blocking Crawls From Cloudflare's Browser Crawl Endpoint Earlier this week, Cloudflare announced the introduction of their Browser Crawl Endpoint. This allows Cloudflare users to crawl an entire website by making a _single_ API call to the Browser rendering service. Although the browser rendering service honours robots.txt they don't define a specific User-Agent that the service will check for, apparently instead expecting website operators to disallow **all** user agents if they want to keep Cloudflare out. However, they have also documented that the service includes Cloudflare specific request headers, allowing requests to be blocked by checking for those. This post details how to achieve that on BunnyCDN, Nginx and Openresty. * * * ### The Headers The relevant header _names_ are documented here. However, unhelpfully, Cloudflare have not provided example/expected values so I had to go digging. `cf-brapi-request-id` contains a unique request ID so, although you can check for the existence of it, relying on the value being a consistent format may be unwise. `Signature-agent` is a little bit more useful. The automatic request headers documentation indicates that the value will point to a path under `https://web-bot-auth.cloudflare-browser-rendering-085.workers.dev/`. It is, however, unclear whether this will always be the case (the inclusion of a number suggests that it may not). * * * ### BunnyCDN BunnyCDN allows the creation of edge rules which can match against request headers. Although they don't provide an explicit way to test for the existence of a header, their glob support allows us to achieve the same effect: Action: Block Request Conditions: Match Any | | Request Header Header Name: Signature-agent Value: https://web-bot-auth.cloudflare-browser-rendering* | | Request Header Header Name: cf-brapi-request-id Value: * Within the web UI, the conditions look like this: * * * ### Nginx Requests can also be blocked in Nginx: if ($http_signature_agent ~ "^https://web-bot-auth.cloudflare-browser-rendering(.*)") { return 403; } if ($http_cf_brapi_request_id){ return 403; } Note: although _if is evil_ it's considered that using `return` is 100% safe. * * * #### OpenResty If you're using OpenResty you can still use the Nginx config, but can also achieve the same in LUA: local h = ngx.req.get_headers() if h["cf-brapi-request-id"] then return ngx.exit(403) end if h["signature-agent"] and h["signature-agent"] ~= "https://web-bot-auth.cloudflare-browser-rendering*" then return ngx.exit(403) end This snippet can easily be included in a `header_filter_by_lua` block with custom response headers added for debugging purposes: header_filter_by_lua ' local h = ngx.req.get_headers() if h["cf-brapi-request-id"] then ngx.header["x-reason"] = "Foxtrot Oscar my old buddy" return ngx.exit(403) end if h["signature-agent"] and h["signature-agent"] ~= "https://web-bot-auth.cloudflare-browser-rendering*" then ngx.header["x-reason"] = "Sign this..." return ngx.exit(403) end '; * * * ### Conclusion I already have more than enough unwanted traffic hitting my servers without Cloudflare giving others an off-the-shelf ability to one-shot my services. To give Cloudflare their dues, though, they have at least documented how to block their browser rendering service. It could _perhaps_ have been more clearly documented, but the information is at least there. Still, it would have been nice if they could have defined a _specific_ user-agent to be added to `robots.txt` rather than expecting people to check headers on every request.

Blocking Crawls From Cloudflare's Browser Crawl Endpoint
Author: Ben Tasker

www.bentasker.co.uk/posts/documentation/gene...

#bots #bunnycdn #cloudflare #nginx #openresty

0 0 0 0

Jesse Skinner

@jesseskinner.toot.cafe.ap.brid.gy

8 months ago

I've said this before, but I can't say it enough: OpenResty (nginx+lua) is painfully underappreciated. It's fantastic, and the only platform I trust for very high volume, high performance, mission critical web application servers.

#openresty #nginx #webdev

0 1 1 0

Ubuntu

@ubuntu.activitypub.awakari.com.ap.brid.gy

1 year ago

Awakari App

Apa Itu OpenResty dan Cara Install di VPS Ubuntu OpenResty adalah platform open-source berbasis N...

https://www.rumahweb.com/journal/openresty-adalah/

#VPS #Install #OpenResty #OpenResty #ubuntu

Event Attributes

0 0 0 0

Frederic Branczyk

@brancz.com

1 year ago

A Walk with LuaJIT How we scrape callstack information from the LuaJIT engine for profiling

Tommy has really outdone himself on this one! The LuaJIT engine is super sophisticated, and this post goes into all the gory details of how we had to reverse-engineer it so we can profile it with zero-instrumentation!

#openresty #luajit #profiling

4 1 0 0