I’ve been using HAProxy for a very long time. In most cases, I work on stuff where HAProxy configuration fits into the fire-and-forget category, with occasional tweaks to ACLs, redirections, and throttling. But, now and then I find myself in a situation where I end up solving a certain problem with HAProxy, but in a, let’s call it, creative way. 😀
The other day, I wanted to deny access to visitors (well, bots 😮💨) who were blindly hitting various URLs hoping to find things such as exposed log files, database dumps, software vulnerabilities, etc. Bots were constantly changing their user-agent header, and they were using user-agents associated with popular web browsers, so there was no way of blocking them based on that. They were also making a lot of requests to valid URLs, probably to make themselves less obvious while doing the bad stuff. Their request rate wasn’t alarmingly high, so they were under the configured throttling thresholds for most of the time. Still, it was annoying to see tens of thousands of processed requests after analyzing access logs.
In the end, what distinguished these bots from legitimate website visitors, was the fact that they were occasionally making higher than normal HTTP request rate with 404 (not found) response status. That gave me an idea.
Configure 404 error request tracking
So, the main idea is to identify abusive visitors by their IP address (i.e. the ones that exceed the allowed 404 error rate), and block the IP for 20 minutes. I was interested in tracking the 404 error request rate per IP, but only for page requests. I decided to exclude 404 errors for static resources from the equation to minimize the chance of blocking legitimate users who accessed a page with missing static resources, such as images. Thankfully, HAProxy comes with a powerfull request rate tracking ability out-of-the-box, as part of the stick table feature. This feature has been available for years, but just to clarify, the example in this post was tested with HAProxy 2.8.
So, first we need to define what qualifies as a static resource. In most cases, we can simply use file extension to identify those. We’ll also define a few trusted user-agents that we’ll exclude from being blocked (e.g. Googlebot, Bingbot, etc.).
|
|
Secondly, we need to configure a reasonable 404 error rate threshold. The way we go about that is to define a sliding time window for gpc0_rate
built-in counter. In this example, I’m interested in identifying sudden spikes in the 404 error rate. So my threshold is 5 requests within the 10-second sliding window. You don’t have to, but I prefer declaring stick tables in separate backends, because that allows me to use multiple stick tables within the same HAProxy frontend.
|
|
What’s happening here is that we declared a HAProxy backend that will receive no traffic whatsoever. Its only purpose is to store a stick table that can hold up to 100000 keys of type ip
(i.e. IPv4 address). The expiration TTL for each key is 20 minutes. Bear in mind that TTL is reset to default value every time the key is added or a general-purpose counter is updated.
Speaking of general-purpose counters, we’ve declared that we want to track 3 of them. gpc0_rate(10s)
is the 10-second sliding window mentioned earlier, gpc0
is an integer counter that we’ll increment on every 404 error, and gpc1
is the integer counter that we’ll increment if a visitor exceeds the allowed 404 error rate. By default, every visitor that triggers a 404 error will have gpc1=0
for as long as they don’t exceed the error rate threshold.
Oh, right. The 20-minute key expiration TTL mentioned earlier effectively defines the total time the IP will be blocked once the allowed error rate limit is exceeded.
Let’s focus on the tracking logic for a moment.
|
|
Most of the logic is explained with inline comments, but let’s analyze things from top to bottom. First, we declare ACLs that simply define an acceptable 404 error rate threshold and what qualifies as a blocked IP address. It’s important to understand that lines that begin with acl
keyword are only ACL declarations. ACLs are not evaluated at this point. They will be evaluated later, when being referenced by http-response
and http-request
functions.
Equally important is to understand that http-request track-sc0
declares that we want to track IP address stored in X-Forwarded-For
request header if static_file
and excluded_user_agent
ACLs resolve to false. We can’t add { status 404 }
ACL to the list of conditions, because the HTTP request hasn’t been processed yet. While this directive adds almost everyones IP to stick table, none of the counters are updated at this point, so no one is blocked.
Counters are updated when we invoke incrementing of gpc0
counter (line 13), but only if inline ACL status 404
resolves to true (i.e. if the HTTP response code was 404). If the request didn’t satisfy all conditions earlier, then the visitor’s IP wasn’t added to stick table, so there’s nothing to increment.
You probably noticed that we didn’t have to do anything regarding the gpc0_rate
counter. That one HAProxy automatically updated for us, since it’s tied to gpc0
under the hood.
On line 16 we’re incrementing gpc1
counter if the visitor exceeded the allowed error rate. If that is the case, the ACL 404_blocked_ip
resolves to true when being evaluated in line 19. There, we’re instructing HAProxy to return the 403 (forbidden) error page.
Once blocked, the IP will remain blocked until the stick table key expires. As mentioned earlier, we set that to 20 minutes. However, there are 3 scenarios in which the IP can be unblocked earlier:
- HAProxy service was restarted. Stick tables can’t be preserved across service restarts, unless you have multiple HAProxy instances connected as peers, but that’s a topic for another day.
- stick table entry limit was reached. When table entry limit is reached, HAProxy removes some of the oldest entries to make room for new ones. If you find yourself in a situation like that, you will want to increase stick-table
size
parameter. According to the official documentation, when calculating maximum memory usage for stick table withip
key type, each entry will take roughly 50 bytes of memory. - complete stick table or specific entry was cleared using the runtime API. This can be achieved by calling the
clear table [table_name]
command.
Manipulate HAProxy stick table entries
If you want to inspect entries tracked in a stick table, and their counter values, you can communicate with HAProxy’s runtime API via unix socket. To do that, you need to the expose socket with admin-level privileges:
|
|
Afterward, you can issue instructions using echo and socat commands. Here are a few examples:
# show full table contents
echo "show table bk_404_tracking" | socat stdio /var/lib/haproxy/haproxy.sock
# empty the whole table
echo "clear table bk_404_tracking" | socat stdio /var/lib/haproxy/haproxy.sock
# remove a single IP from stick table
echo "clear table bk_404_tracking key xxx.xxx.xxx.xxx" | socat stdio /var/lib/haproxy/haproxy.sock
# show all IPs that exceeded allowed 404 error rate limit
echo "show table bk_404_tracking" | socat stdio /var/lib/haproxy/haproxy.sock | grep -E "gpc1=[1-9]"
Test before using in production
It’s recommended to always test things before using them in production environment. Even if you don’t have a test environment, there’s a way to safely test 404 error rate limit thresholds in production, but without blocking anyone’s access. This can be achieved by routing requests to a separate HAProxy backend, which has the same servers as the original one. Expanding our example, that would look something like this:
|
|
That way, you can identify potentially blocked requests and visitor IP addresses by searching for bk_copy_of_original_backend
backend name in access logs. Alternatively, instead of messing around with backends, you can also inject a cookie or some other request header that would allow you to identify requests/IPs that would have been blocked.
There you have it. One more way of using HAProxy for things that it was or wasn’t designed to do! 😀