From d6d879133e67aa967d849a0b73ddde25ddd4bb54 Mon Sep 17 00:00:00 2001 From: Remilia Da Costa Faro Date: Fri, 21 Mar 2025 20:39:34 +0100 Subject: Allow filtering by remote addresses (#52) * Added the possibility to define rules for remote addresses * Added change in changelog * Added check for X-Real-Ip and X-Forwarded-For when checking for remote address filtering * cmd/anubis: refine IP filtering logic * Optimize the configuration so that the IP trie is created once at application start instead of dynamically being created every request. * Document the changes in the changelog and docs site. * Allow pure IP range filtering. * Allow user agent based IP range filtering. * Allow path based IP range filtering. * Create --debug-x-real-ip-default flag for testing Anubis locally without a HTTP load balancer. --------- Co-authored-by: Xe Iaso --- docs/docs/CHANGELOG.md | 20 ++++++++++++++++++++ docs/docs/admin/policies.md | 29 +++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) (limited to 'docs') diff --git a/docs/docs/CHANGELOG.md b/docs/docs/CHANGELOG.md index d88931b..d54cfff 100644 --- a/docs/docs/CHANGELOG.md +++ b/docs/docs/CHANGELOG.md @@ -40,6 +40,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [KagiBot](https://kagi.com/bot) is allowed through the filter [#44](https://github.com/TecharoHQ/anubis/pull/44) - Fixed hang when navigator.hardwareConcurrency is undefined - Support Unix domain sockets [#45](https://github.com/TecharoHQ/anubis/pull/45) +- Allow filtering by remote addresses: + + ```json + { + "name": "qwantbot", + "user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/", + "action": "ALLOW", + "remote_addresses": ["91.242.162.0/24"] + } + ``` + + This also works at an IP range level: + + ```json + { + "name": "internal-network", + "action": "ALLOW", + "remote_addresses": ["100.64.0.0/10"] + } + ``` ## 1.13.0 diff --git a/docs/docs/admin/policies.md b/docs/docs/admin/policies.md index 481a455..abd6139 100644 --- a/docs/docs/admin/policies.md +++ b/docs/docs/admin/policies.md @@ -68,6 +68,8 @@ There are three actions that can be returned from a rule: Name your rules in lower case using kebab-case. Rule names will be exposed in Prometheus metrics. +### Challenge configuration + Rules can also have their own challenge settings. These are customized using the `"challenge"` key. For example, here is a rule that makes challenges artificially hard for connections with the substring "bot" in their user agent: ```json @@ -91,6 +93,33 @@ Challenges can be configured with these settings: | `report_as` | `4` | What difficulty the UI should report to the user. Useful for messing with industrial-scale scraping efforts. | | `algorithm` | `"fast"` | The algorithm used on the client to run proof-of-work calculations. This must be set to `"fast"` or `"slow"`. See [Proof-of-Work Algorithm Selection](./algorithm-selection) for more details. | +### Remote IP based filtering + +The `remote_addresses` field of a Bot rule allows you to set the IP range that this ruleset applies to. + +For example, you can allow a search engine to connect if and only if its IP address matches the ones they published: + +```json +{ + "name": "qwantbot", + "user_agent_regex": "\\+https\\:\\/\\/help\\.qwant\\.com/bot/", + "action": "ALLOW", + "remote_addresses": ["91.242.162.0/24"] +} +``` + +This also works at an IP range level without any other checks: + +```json +{ + "name": "internal-network", + "action": "ALLOW", + "remote_addresses": ["100.64.0.0/10"] +} +``` + +## Risk calculation for downstream services + In case your service needs it for risk calculation reasons, Anubis exposes information about the rules that any requests match using a few headers: | Header | Explanation | Example | -- cgit v1.2.3