Block abusive IPs based on 404 error rate using HAProxy

locked gate - designed by: Freepik

I’ve been using HAProxy for a very long time. In most cases, I work on stuff where HAProxy configuration fits into the fire-and-forget category, with occasional tweaks to ACLs, redirections, and throttling. But, now and then I find myself in a situation where I end up solving a certain problem with HAProxy, but in a, let’s call it, creative way. 😀

The other day, I wanted to deny access to visitors (well, bots 😮‍💨) who were blindly hitting various URLs hoping to find things such as exposed log files, database dumps, software vulnerabilities, etc. Bots were constantly changing their user-agent header, and they were using user-agents associated with popular web browsers, so there was no way of blocking them based on that. They were also making a lot of requests to valid URLs, probably to make themselves less obvious while doing the bad stuff. Their request rate wasn’t alarmingly high, so they were under the configured throttling thresholds for most of the time. Still, it was annoying to see tens of thousands of processed requests after analyzing access logs.

In the end, what distinguished these bots from legitimate website visitors, was the fact that they were occasionally making higher than normal HTTP request rate with 404 (not found) response status. That gave me an idea.

mr-burns-excellent

Configure 404 error request tracking

So, the main idea is to identify abusive visitors by their IP address (i.e. the ones that exceed the allowed 404 error rate), and block the IP for 20 minutes. I was interested in tracking the 404 error request rate per IP, but only for page requests. I decided to exclude 404 errors for static resources from the equation to minimize the chance of blocking legitimate users who accessed a page with missing static resources, such as images. Thankfully, HAProxy comes with a powerfull request rate tracking ability out-of-the-box, as part of the stick table feature. This feature has been available for years, but just to clarify, the example in this post was tested with HAProxy 2.8.

So, first we need to define what qualifies as a static resource. In most cases, we can simply use file extension to identify those. We’ll also define a few trusted user-agents that we’ll exclude from being blocked (e.g. Googlebot, Bingbot, etc.).

1
2
3
4
frontend ft_example...
...
    acl static_file path_end .css .js .jpg .jpeg .gif .ico .png .bmp .webp .csv .ttf .woff .svg .svgz
    acl excluded_user_agent hdr_reg(user-agent) -i (yahoo|yandex|kagi|(google|bing)bot)

Secondly, we need to configure a reasonable 404 error rate threshold. The way we go about that is to define a sliding time window for gpc0_rate built-in counter. In this example, I’m interested in identifying sudden spikes in the 404 error rate. So my threshold is 5 requests within the 10-second sliding window. You don’t have to, but I prefer declaring stick tables in separate backends, because that allows me to use multiple stick tables within the same HAProxy frontend.

1
2
3
backend bk_404_tracking
    # stick table for tracking 404 error rate
    stick-table type ip size 100k expire 20m store gpc0,gpc1,gpc0_rate(10s)

What’s happening here is that we declared a HAProxy backend that will receive no traffic whatsoever. Its only purpose is to store a stick table that can hold up to 100000 keys of type ip (i.e. IPv4 address). The expiration TTL for each key is 20 minutes. Bear in mind that TTL is reset to default value every time the key is added or a general-purpose counter is updated.

Speaking of general-purpose counters, we’ve declared that we want to track 3 of them. gpc0_rate(10s) is the 10-second sliding window mentioned earlier, gpc0 is an integer counter that we’ll increment on every 404 error, and gpc1 is the integer counter that we’ll increment if a visitor exceeds the allowed 404 error rate. By default, every visitor that triggers a 404 error will have gpc1=0 for as long as they don’t exceed the error rate threshold.

Oh, right. The 20-minute key expiration TTL mentioned earlier effectively defines the total time the IP will be blocked once the allowed error rate limit is exceeded.

Let’s focus on the tracking logic for a moment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
frontend ft_example...
...
    # checks if the 404 error rate limit was exceeded (it's true if gpc0_rate counter value >= 5)
    acl 404_rate_exceeded sc0_gpc0_rate(bk_404_tracking) ge 5

    # checks if the IP is blocked (it's true if gpc1 counter value > 0)
    acl 404_blocked_ip sc0_get_gpc1(bk_404_tracking) gt 0

    # this tracks all IPs that meet the conditions, but the counters aren't incremented at this phase
    http-request track-sc0 hdr(X-Forwarded-For) table bk_404_tracking if !static_file !excluded_user_agent

    # increment gpc0 counter if response status was 404 (effectively triggers request tracking)
    http-response sc-inc-gpc0(0) if { status 404 }

    # increment gpc1 counter if the 404 request rate was breached
    http-response sc-inc-gpc1(0) if 404_rate_exceeded

    # return 403 error page if the IP is on the block list (i.e. gpc1 counter value > 0)
    http-request deny if 404_blocked_ip

Most of the logic is explained with inline comments, but let’s analyze things from top to bottom. First, we declare ACLs that simply define an acceptable 404 error rate threshold and what qualifies as a blocked IP address. It’s important to understand that lines that begin with acl keyword are only ACL declarations. ACLs are not evaluated at this point. They will be evaluated later, when being referenced by http-response and http-request functions.

Equally important is to understand that http-request track-sc0 declares that we want to track IP address stored in X-Forwarded-For request header if static_file and excluded_user_agent ACLs resolve to false. We can’t add { status 404 } ACL to the list of conditions, because the HTTP request hasn’t been processed yet. While this directive adds almost everyones IP to stick table, none of the counters are updated at this point, so no one is blocked.

Counters are updated when we invoke incrementing of gpc0 counter (line 13), but only if inline ACL status 404 resolves to true (i.e. if the HTTP response code was 404). If the request didn’t satisfy all conditions earlier, then the visitor’s IP wasn’t added to stick table, so there’s nothing to increment.

You probably noticed that we didn’t have to do anything regarding the gpc0_rate counter. That one HAProxy automatically updated for us, since it’s tied to gpc0 under the hood.

On line 16 we’re incrementing gpc1 counter if the visitor exceeded the allowed error rate. If that is the case, the ACL 404_blocked_ip resolves to true when being evaluated in line 19. There, we’re instructing HAProxy to return the 403 (forbidden) error page.

Once blocked, the IP will remain blocked until the stick table key expires. As mentioned earlier, we set that to 20 minutes. However, there are 3 scenarios in which the IP can be unblocked earlier:

  1. HAProxy service was restarted. Stick tables can’t be preserved across service restarts, unless you have multiple HAProxy instances connected as peers, but that’s a topic for another day.
  2. stick table entry limit was reached. When table entry limit is reached, HAProxy removes some of the oldest entries to make room for new ones. If you find yourself in a situation like that, you will want to increase stick-table size parameter. According to the official documentation, when calculating maximum memory usage for stick table with ip key type, each entry will take roughly 50 bytes of memory.
  3. complete stick table or specific entry was cleared using the runtime API. This can be achieved by calling the clear table [table_name] command.

Manipulate HAProxy stick table entries

If you want to inspect entries tracked in a stick table, and their counter values, you can communicate with HAProxy’s runtime API via unix socket. To do that, you need to the expose socket with admin-level privileges:

1
2
global
    stats socket /var/lib/haproxy/haproxy.sock level admin

Afterward, you can issue instructions using echo and socat commands. Here are a few examples:

# show full table contents
echo "show table bk_404_tracking" | socat stdio /var/lib/haproxy/haproxy.sock

# empty the whole table
echo "clear table bk_404_tracking" | socat stdio /var/lib/haproxy/haproxy.sock

# remove a single IP from stick table
echo "clear table bk_404_tracking key xxx.xxx.xxx.xxx" | socat stdio /var/lib/haproxy/haproxy.sock

# show all IPs that exceeded allowed 404 error rate limit
echo "show table bk_404_tracking" | socat stdio /var/lib/haproxy/haproxy.sock | grep -E "gpc1=[1-9]"

Test before using in production

It’s recommended to always test things before using them in production environment. Even if you don’t have a test environment, there’s a way to safely test 404 error rate limit thresholds in production, but without blocking anyone’s access. This can be achieved by routing requests to a separate HAProxy backend, which has the same servers as the original one. Expanding our example, that would look something like this:

1
2
3
4
5
6
7
8
frontend ft_example...
...
    # commented out to disable request blocking
    # return 403 error page if the IP is on the block list (i.e. gpc1 counter value > 0)
    # http-request deny if 404_blocked_ip

    # instead of blocking, route request to a different HAProxy backend
    use_backend bk_copy_of_original_backend

That way, you can identify potentially blocked requests and visitor IP addresses by searching for bk_copy_of_original_backend backend name in access logs. Alternatively, instead of messing around with backends, you can also inject a cookie or some other request header that would allow you to identify requests/IPs that would have been blocked.

There you have it. One more way of using HAProxy for things that it was or wasn’t designed to do! 😀