aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorXe Iaso <me@xeiaso.net>2025-03-17 19:33:07 -0400
committerXe Iaso <me@xeiaso.net>2025-03-17 19:33:07 -0400
commit9923878c5c8b68df7f132efd28f76ce5478a1f1a (patch)
treec18dfc413495c09886b0d622a275f142f3e9c333 /docs
downloadanubis-9923878c5c8b68df7f132efd28f76ce5478a1f1a.tar.xz
anubis-9923878c5c8b68df7f132efd28f76ce5478a1f1a.zip
initial import from /x/ monorepo
Signed-off-by: Xe Iaso <me@xeiaso.net>
Diffstat (limited to 'docs')
-rw-r--r--docs/policies.md77
1 files changed, 77 insertions, 0 deletions
diff --git a/docs/policies.md b/docs/policies.md
new file mode 100644
index 0000000..1e1b911
--- /dev/null
+++ b/docs/policies.md
@@ -0,0 +1,77 @@
+# Policies
+
+Out of the box, Anubis is pretty heavy-handed. It will aggressively challenge everything that might be a browser (usually indicated by having `Mozilla` in its user agent). However, some bots are smart enough to get past the challenge. Some things that look like bots may actually be fine (IE: RSS readers). Some resources need to be visible no matter what. Some resources and remotes are fine to begin with.
+
+Bot policies let you customize the rules that Anubis uses to allow, deny, or challenge incoming requests. Currently you can set policies by the following matches:
+
+- Request path
+- User agent string
+
+Here's an example rule that denies [Amazonbot](https://developer.amazon.com/en/amazonbot):
+
+```json
+{
+ "name": "amazonbot",
+ "user_agent_regex": "Amazonbot",
+ "action": "DENY"
+}
+```
+
+When this rule is evaluated, Anubis will check the `User-Agent` string of the request. If it contains `Amazonbot`, Anubis will send an error page to the user saying that access is denied, but in such a way that makes scrapers think they have correctly loaded the webpage.
+
+Right now the only kinds of policies you can write are bot policies. Other forms of policies will be added in the future.
+
+Here is a minimal policy file that will protect against most scraper bots:
+
+```json
+{
+ "bots": [
+ {
+ "name": "well-known",
+ "path_regex": "^/.well-known/.*$",
+ "action": "ALLOW"
+ },
+ {
+ "name": "favicon",
+ "path_regex": "^/favicon.ico$",
+ "action": "ALLOW"
+ },
+ {
+ "name": "robots-txt",
+ "path_regex": "^/robots.txt$",
+ "action": "ALLOW"
+ },
+ {
+ "name": "generic-browser",
+ "user_agent_regex": "Mozilla",
+ "action": "CHALLENGE"
+ }
+ ]
+}
+```
+
+This allows requests to [`/.well-known`](https://en.wikipedia.org/wiki/Well-known_URI), `/favicon.ico`, `/robots.txt`, and challenges any request that has the word `Mozilla` in its User-Agent string. The [default policy file](../botPolicies.json) is a bit more cohesive, but this should be more than enough for most users.
+
+If no rules match the request, it is allowed through.
+
+## Writing your own rules
+
+There are three actions that can be returned from a rule:
+
+| Action | Effects |
+| :---------- | :-------------------------------------------------------------------------------- |
+| `ALLOW` | Bypass all further checks and send the request to the backend. |
+| `DENY` | Deny the request and send back an error message that scrapers think is a success. |
+| `CHALLENGE` | Show a challenge page and/or validate that clients have passed a challenge. |
+
+Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics.
+
+In case your service needs it for risk calculation reasons, Anubis exposes information about the rules that any requests match using a few headers:
+
+| Header | Explanation | Example |
+| :---------------- | :--------------------------------------------------- | :--------------- |
+| `X-Anubis-Rule` | The name of the rule that was matched | `bot/lightpanda` |
+| `X-Anubis-Action` | The action that Anubis took in response to that rule | `CHALLENGE` |
+| `X-Anubis-Status` | The status and how strict Anubis was in its checks | `PASS-FULL` |
+
+Policy rules are matched using [Go's standard library regular expressions package](https://pkg.go.dev/regexp). You can mess around with the syntax at [regex101.com](https://regex101.com), make sure to select the Golang option.