The surprising complexity of interpreting X-Forwarded-For safely

I've seen a lot of uncertainty and misunderstandings about how to handle IP addresses correctly when developing and operating a web service:

What's my user's IP address?
How do I use the X-Forwarded-For header?
What's the difference between that and X-Real-IP or other HTTP headers?

This post explains the need for X-Forwarded-For (hereafter, "XFF"), provides a mental model for working with it, and then gives guidance on how to handle different situations.

I'll first cover why it exists, how to think about it, how to use it, and finally some alternative approaches that may be more appropriate.

(See the end for a summary.)

Why XFF?

I'll explain the purpose of this header by starting simple and then adding in the layers of complexity that lead to it being necessary.

If you're already intimately familiar with this header, feel free to skip ahead to the section "From XFF to IP chain".

Starting simple

In the simplest possible case, you just have a client and a server. This rarely happens on today's internet, but let's start there! The client is probably some person using a browser on their laptop, and the server is directly exposed to the internet. The server sees a request like this:

GET / HTTP/1.1
Host: www.example.com
Accept: text/html

...and it knows that the request came in on a TCP connection from IP address 1.2.3.4.

You can see that the HTTP request does not itself contain any information about the client IP address. Instead, the server has to look at the TCP connection the request came over.

We'll represent this as the following diagram, where "no headers" is short for "no IP address related headers":

client @ 1.2.3.4:
  Sends: [no headers] to server

server:
  Reads: [no headers] from 1.2.3.4

This is the world HTTP started in. A client, a server, nothing in between, nothing complicated.

But then, proxies

It turns out that this situation isn't good enough for a lot of people's needs. Maybe you want to have a load balancer in front of a collection of identical servers. The client sends requests to the load-balancer, and the load-balancer forwards them to whichever server seems least busy. This is a type of proxy. Here's what the chain of requests looks like:

client @ 1.2.3.4:
  Sends: [no headers] to load-balancer

load-balancer @ 10.0.3.0:
  Reads: [no headers] from 1.2.3.4
  Sends: [no headers] to server

server:
  Reads: [no headers] from 10.0.3.0

(A public load balancer might have a public IP e.g. 3.3.3.3 and a private-range network address of 10.0.3.0. I'll be using the @ to indicate the IP address that the next node sees; here, the load balancer and server are on the same network.)

All the server can see is "10.0.3.0 is talking to me", and never learns that the real client is at 1.2.3.4. Oops! The proxy is just shuttling data back and forth, and hasn't altered the HTTP request, but by sitting in between the client and server it has ruined the server's ability to see who the real client is. Since all of the clients connect through the proxy, the server thinks they all have the same IP address. That might be a serious problem!

Proxies can send X-Real-IP

One option that people came up with is to have the proxy modify the HTTP request to inject a header bearing the "real" IP address: X-Real-IP. This is not a standardized header, but is quite commonly supported. Here's what it looks like:

client @ 1.2.3.4:
  Sends: [no headers] to load-balancer

load-balancer @ 10.0.3.0:
  Reads: [no headers] from 1.2.3.4
  Sends: [X-Real-IP: 1.2.3.4] to server

server:
  Reads: [X-Real-IP: 1.2.3.4] from 10.0.3.0

Now the server has two IP addresses to work with: 10.0.3.0 is who directly sent the request to the server, but that sender is also claiming that it only got the request from 1.2.3.4.

(Note that the load balancer never says "Hey, I'm 10.0.3.0". First, because it's unnecessary, but second because a node can have multiple IPs and it's not clear which it would even announce.)

And this works out pretty well for setups like this!

But then, two proxies

Where this falls apart is when you have two proxies. This is pretty common in cloud environments. In AWS, you might have a group of application servers that all accept HTTP requests directly, and then a load balancer in front. But that load balancer is only in one geographic region, so you'll also see a CDN put in front of that, mostly acting as a geographically distributed caching proxy.

(The load balancer and the CDN are both groups of servers, not individual nodes in a network, but any one request only passes through one server in each—so it's convenient and not really incorrect to describe a request as passing through "the" load balancer node and "the" CDN node. Just keep in mind that it's not the same one each time.)

Here's what this might look like:

client @ 1.2.3.4:
  Sends: [no headers] to CDN

CDN @ 5.5.5.5:
  Reads: [no headers] from 1.2.3.4
  Sends: [X-Real-IP: 1.2.3.4] to load-balancer

load-balancer @ 10.0.3.0:
  Reads: [X-Real-IP: 1.2.3.4] from 5.5.5.5
  Sends: [X-Real-IP: 5.5.5.5] to server      <-- information lost!

server:
  Reads: [X-Real-IP: 5.5.5.5] from 10.0.3.0

See the problem? With X-Real-IP the server is only given information about the node directly on the other side of the load balancer, and doesn't learn about the real client. The information was present, but was then lost when the load balancer replaced the incoming X-Real-IP header with a new one.

X-Forwarded-For makes a chain

The most commonly adopted solution to this problem is to pass a chain of IPs in an HTTP header. Rather than replacing the existing header and losing the information, each proxy appends to a new header called X-Forwarded-For. (Or sets the header if it's not already present.) Here's how that looks in our CDN/load balancer scenario, where marked values show how appending works:

client @ 1.2.3.4:
  Sends: [no headers] to CDN

CDN @ 5.5.5.5:
  Reads: [no headers] from 1.2.3.4
  Sends: [X-Forwarded-For: 1.2.3.4] to load-balancer

load-balancer @ 10.0.3.0:
  Reads: [X-Forwarded-For: 1.2.3.4] from 5.5.5.5
  Sends: [X-Forwarded-For: 1.2.3.4, 5.5.5.5] to server

server:
  Reads: [X-Forwarded-For: 1.2.3.4, 5.5.5.5] from 10.0.3.0

Now the server has all the information it needs. It knows the request came from remote address 10.0.3.0 and that, if everyone followed the rules, the request was previously handled by 5.5.5.5 and before that by 1.2.3.4. (We'll see later that that's a big "if".)

But how should this information be used?

From XFF to IP chain

Note that the X-Forwarded-For header is by itself insufficient to reconstruct the path taken by the request. It must be combined with the remote address in order to get the fullest possible picture, and to handle the broadest range of situations. The XFF itself is incomplete.

To combine them, simply append the remote address onto the end of the XFF. For lack of a better term I'll be calling this concatenation the IP chain for the rest of the article.

The IP chain is very easy to construct. If you receive X-Forwarded-For: 1.2.3.4, 5.5.5.5 from 10.0.3.0 then the IP chain is 1.2.3.4, 5.5.5.5, 10.0.3.0.

A few things to note about it:

If there's no X-Forwarded-For header, the IP chain is just the remote address, e.g. 1.2.3.4.
If our server were itself to act as a proxy, the IP chain is precisely what it would send as an X-Forwarded-For header to the next node.

A reverse linked list of trust

At this point it would be tempting to say:

Ah, so all the server needs to do now is just pluck out the first entry from the IP chain, and that's the client IP!

Unfortunately... the internet is a terrible place where people lie.

If you're using the client IP to block offending users, the user can simply have their browser send a false initial header of X-Forwarded-For: 7.8.9.0. Now the request arrives at your server with an IP chain of 7.8.9.0, 1.2.3.4, 5.5.5.5, 10.0.3.0 and your server plucks out the 7.8.9.0, totally unaware that this is untrue. If this IP is blocked, the user can keep changing their false request.

This is why, in the general case, you can't just look at any one IP in the IP chain. Whatever your use-case is, your code needs to understand how to interpret the chain. This is all you can know for sure:

The server knows it got the request from 10.0.3.0
Each node in the chain claims it got the request from the previous node (and claims that it was sent all the IPs to the left of that)

The external chain

This means you need to start at the rightmost IP—the one you trust—and walk leftwards, and at some point make a decision of when to stop. Generally, this will happen at some trust boundary—the point at which you stop recognizing IP addresses of hosts that you trust to tell you good information. Beyond that point could be anything.

Everything to the right of this trust boundary is your infrastructure. And everything to the left is outside of your infrastructure, so I'll refer to it as the external chain.

What's in the external chain?

In most circumstances, the external chain will contain a single IP, which will be the "real" client IP.
It could contain a single IP which is the exit node of a VPN the client is using. (If you're using TLS, the VPN couldn't attach an X-Forwarded-For header even if it wanted to, due to the end-to-end encryption.)
In some unusual cases, the external chain might have two elements: The real client IP, and then a corporate HTTP proxy that inspects all traffic flowing through it.
And in all of these cases, the client can give the illusion of more IP addresses to the left side of the external chain, just by passing a spoofed XFF header.

In the general case, if you see an external chain of 7.8.9.0, 1.2.3.4, you won't be able to tell the difference between 1.2.3.4 being the client IP (with 7.8.9.0 being a spoofed XFF) and 7.8.9.0 being the client IP (and 1.2.3.4 being an HTTP proxy).

So, what do you do if there's an external chain with multiple IPs in it? It depends on your use-case. You'll want to use either the rightmost IP, leftmost IP, or entire chain, depending. And that's what the next section is about.

Using the external chain

Here are some examples of how to use the external chain, now that you have it:

IP allowlisting

Just to get some of the more exotic scenarios out of the way, I'll start with the most restrictive, paranoia-requiring case: IP allowlisting (whitelisting). This might come up if you are running a service and a customer only wants computers on their corporate network to be able to access resources on their account. They give you a list of CIDR ranges.

The algorithm here is pretty straightforward: Just use the rightmost IP in the external chain. For example, if the external chain were 7.8.9.0, 1.2.3.4, you want to compare 1.2.3.4 to the customer's IP ranges to make your decision.

(Maybe 7.8.9.0 is spoofed. Or perhaps it's not, and the user has to use some kind of mandatory SSL-stripping corporate proxy and 7.8.9.0 is the actual client, but you don't care—at 1.2.3.4 you've reached the boundary of the client's network, and it's their problem after this point.)

But... how paranoid are you? How strict does your security need to be?

Probably the CDN you use can be used by other people too. Someone else can create a malicious CDN configuration with the same service that uses your load balancer as its origin but doesn't handle the X-Forwarded-For honestly; maybe the attacker has theirs set to drop the incoming XFF and send a falsified one:

...
attacker-CDN:
  Reads: [no headers] from 6.6.6.6
  Sends: [X-Forwarded-For: 1.2.3.4] to load-balancer
...

Now, with minimal cost, the attacker has bypassed your allowlisting protection. Defining a "trustworthy proxy" is harder when it's not a node you own and control, just rent from a pool.

In such a situation, you may need to check how your CDN handles XFF. Is it possible to override with configuration? Do you need to add a secret header that proves to you that the request came via your proxy and not someone else's? (What happens if they put a CDN proxy in front of yours and call through that? Will you make sure to only trust one layer of that CDN's IP ranges?)

So IP restriction is a case where you might decide not to use the IP chain at all. See later section "How to not use XFF".

With that out of the way, on to some more low-stakes examples.

Georestriction

With georestriction, the idea is that a service or resource is only permitted to be accessed by people from certain parts of the world.

This might be as specific as a city or as broad as a continent, but is often specified at the country level. Usually this is a contractual obligation; perhaps a publisher only has publishing rights on a certain continent, or a sports broadcaster is only allowing streaming access outside of the hosting country (where they want everyone to watch on cable or broadcast TV, perhaps.) It's understood that a certain amount of leakiness is OK, here; if a few people are watching a sports game from the wrong country, that's not a big deal.

Alternatively, there may be government restrictions such as embargo of advanced technical specifications to a country that OFAC is imposing relevant sanctions against. (This isn't my area, so I'm not sure how the stringency compares to contractual obligations.)

Luckily, as long as your IP-to-country-code lookup service is doing a good job, this is actually a pretty simple type of enforcement to do. It doesn't matter if someone's using a proxy (other than yours) in a blocked country while they are themselves in an allowed country, or vice versa; the content is in either case present in unencrypted form in a blocked country either way. So, the algorithm here is "deny if any":

Take all of the IP addresses to the left of the trusted ones
Look up the country code for each IP address
If any IP address is in a blocked country, Deny
Otherwise, Allow

There's some other nuance to consider:

What will you do when an IP does not resolve to a country code? Do you default to deny or allow? (Depends on your business case.)
Do you really want to look up all the IP addresses even if there's a very long chain? If someone sends in a request with an external chain containing 100 entries, probably most of that is fake; perhaps you want to only look up the first 3 or 4 untrusted IPs.
It is sometimes possible to detect proxies that spoof the user's geographic location. See the appendix for more information.

Localization

This is another geolocation use-case, but rather than blocking people based on their country, you're trying to guess their location so you can show them different content.

There are often better alternatives; if you want their preferred language or time zone, the browser can usually tell you that via a header or in a Javascript API! And for many use-cases, you can simply ask them.

For localization, you're usually not in adversarial scenario. Here, you want to use the leftmost IP in the external chain. It can be spoofed, but for localization you're unlikely to care about that.

Rate-limiting

Rate-limiting is used when there is an expensive, sensitive, or critical API endpoint and you want to restrict how frequently it can be called per-caller. You'll often see these on APIs in general to protect against poorly written clients, or on authentication endpoints to slow down attacks to a manageable rate. The requirement here is to produce a rate-limiting key that can be used to put requests into buckets, with each bucket allowed a certain rate of requests. For authenticated users this can be as simple as the account ID; for anonymous users you'll likely need to fall back to their IP address.

Generally this is going to be quite simple: Take the rightmost IP in the external chain. That's your rate-limiting key. (Why rightmost? Because rate-limiting is an adversarial use-case, and the chance of spoofed IPs is very high.)

As you add and remove and change CDNs, you'll need to be careful to update your CIDR ranges of trusted proxies accordingly. For example, if you put a CDN in front and forget to update your configuration, everyone in a region might be assigned to the same rate-limiting bucket. You'll be getting requests with "external chains" like 1.2.3.4, 5.5.5.5 and 5.6.7.8, 5.5.5.5 and everyone will be seen as having IP address 5.5.5.5. People will get blocked, especially during high-traffic periods. Naturally, you'll want anomaly detection monitoring on your rate-limiting so that you're able to detect, revert, and fix such a misconfiguration quickly.

Audit logs

There are assorted other business needs where IP addresses are used to identity people. This is not reliable, since people share IP addresses and IP addresses move around, but since it at least mostly works there can also be ethical (and regulatory) risks for storing this information and it is best avoided if possible. Nevertheless, there are legitimate business cases for collecting and storing this information at times, usually relating to security:

Audit logs (including simple access logs) that allow reconstructing someone's behavior over time, which can useful in investigating a security incident
Showing a user where they have active login sessions, e.g. if they have multiple sessions open that are aparently in different countries

For this, you want to use the whole external chain. Someone is going to be analyzing this data manually. Just store or display the whole list so they have all the information they can use, and they'll sort it out themselves, filtering as needed.

How to not use XFF

You may have noticed one recurring theme: It's very important to set the trust boundary correctly, but also difficult to do so in a robust way. Keeping an up-to-date list of CIDR ranges for each proxy requires another process that needs upkeep and monitoring. If the ranges go out of sync, your IP allowlisting might start denying everyone, or your georestriction might think every request involves one country. Not all CDNs or other proxies are even conducive to this kind of configuration.

There's an alternative: Hardcode the trust boundary. This can take several forms.

Fixed index

If you always know the number of proxies between you and the last external IP, you can just configure your application to always strip off the last N entries from the IP chain. The simplicity of this option is very alluring, but fragile in the face of network changes.

Set header at boundary

A more robust option is to configure your outermost proxy to set a header with the IP that is directly before it in requests. This is like X-Real-IP—it could even be X-Real-IP, for some deployments—but with the critical requirement that none of the intervening proxies will alter it.

For example, Cloudflare provides CF-Connecting-IP, a non-standard header that is specific to their service and that helps implement this strategy. Here's a worked example for a request in which the client is trying several methods of spoofing their IP, passing in faked headers:

spoofing-client @ 1.2.3.4:
  Sends: [CF-Connecting-IP: 7.8.9.0; X-Forwarded-For: 7.8.9.0] to Cloudflare     <-- spoofed headers

Cloudflare @ 5.5.5.5:
  Reads: [CF-Connecting-IP: 7.8.9.0; X-Forwarded-For: 7.8.9.0] from 1.2.3.4
  Sends: [CF-Connecting-IP: 1.2.3.4; X-Forwarded-For: 7.8.9.0, 1.2.3.4] to load-balancer    <-- overwrites CF-Connecting-IP

load-balancer @ 10.0.3.0:
  Reads: [CF-Connecting-IP: 1.2.3.4; X-Forwarded-For: 7.8.9.0, 1.2.3.4] from 5.5.5.5
  Sends: [CF-Connecting-IP: 1.2.3.4; X-Forwarded-For: 7.8.9.0, 1.2.3.4, 5.5.5.5] to server

server:
  Reads: [CF-Connecting-IP: 1.2.3.4; X-Forwarded-For: 7.8.9.0, 1.2.3.4, 5.5.5.5] from 10.0.3.0

Here, Cloudflare maintains the X-Forwarded-For header as usual, but it also sets the new CF-Connecting-IP header. Critically, it throws away any existing value for that header, since it comes from outside the trust boundary. Equally critically, no later proxy alters that header.

Again, this is what the server sees:

IP chain: 7.8.9.0, 1.2.3.4, 5.5.5.5, 10.0.3.0
Additional CF-Connecting-IP value of 1.2.3.4

Here, Cloudflare has done the work of identifying the trust boundary. If you need to construct the external chain, the trust boundary is the first IP before the last instance of the CF-Connecting-IP value. Walk leftwards as before until 1.2.3.4 is reached, discarding 10.0.3.0 and then 5.5.5.5 in turn:

7.8.9.0, 1.2.3.4, 5.5.5.5, 10.0.3.0
7.8.9.0, 1.2.3.4, 5.5.5.5, 10.0.3.0
7.8.9.0, 1.2.3.4, 5.5.5.5, 10.0.3.0

Therefore, the external chain is 7.8.9.0, 1.2.3.4, and it may be used in all the ways described above.

(Of course, you can also take shortcuts here; any code requiring the leftmost IP can take it from the original IP chain. And any code requiring the first untrusted IP can use the CF-Connecting-IP without looking at the chain at all. Other use-cases need to construct the chain.)

Caveats with this approach:

I'm not sure which CDNs offer a feature like this.
Ensure that your CDN properly drops and replaces any existing incoming header of that name, including any case variations.
If anyone can bypass your CDN and make requests directly to your application or load-balancer, they can claim to be the trust boundary by sending in their own spoofed header. This failure mode is true for many of the approaches listed in this article.
Finding the matching IP in the list may require canonicalizing the IP addresses for comparison. Straight string comparisons may result in bugs or vulnerabilities.

Fragility is inherent

All of the techniques listed here have in common that they are fragile in the face of uncoordinated network changes, or even coordinated ones. If a proxy in front of your service is added or removed, code dealing with IP addresses is likely to break. There are several things you can do to help mitigate this:

Monitoring: If you have anomaly detection set up to alert you on sudden changes in rate-limiting, georestriction, and allowlisting denials, you're in a much better position to detect an uncoordinated change (at the possible expense of false positives). In particular, I would suggest monitoring the average length of the calculated external chain for some high-traffic but low-sensitivity API call, which should quickly alert you to changes without incurring false positives during attacks.
Rotation support: When changing your network configuration, you may need a way for the same server to support multiple configurations, e.g. custom headers from two different CDNs as you switch from one to the other. Being able to configure a list of custom headers (as a fallback cascade) gives you the freedom to make this change. The ability to fall back to the plain X-Forwarded-For and remote address may also be useful for some installations.

Summary

If you concatenate the X-Forwarded-For header and the remote address of the HTTP connection (called remote_addr in some systems) you get a list of IP addresses. Both parts must be included.
This list is a chain that must be walked from right to left (i.e. backwards).
While walking leftwards, discard each IP address you encounter that you recognize and trust.
The remaining list can then be used, but usage depends on context.
There are alternative ways to produce the IP list, but all methods require you to know your network configuration.

Appendix: Proxy detection

I'd like to include some additional information on the topic of proxy detection. People will often evade georestriction by using VPNs with exit nodes in different countries, and service providers sometimes want to detect and block these, or at least treat them as "unknown country". I've worked with a vendor offering a detection service for proxies and they did a kind of miserable job of it, blocking a number of IPs that were not HTTP proxies at all. I learned a good deal from this experience and can offer some tips.

A mistake I've seen multiple vendors make is blocking all Tor entrance and middle relays rather than just Tor exit nodes. This ends up blocking a lot of people and data centers that are contributing to Tor but not in a way that sends traffic to your site. ("Guard" and "middle" relays only relay traffic to other Tor nodes, not to you. If you get traffic from one, it's not Tor traffic.) So if you need to block Tor, skip the vendor and just periodically download Tor's own exit list. It's far more reliable. (This is also a great test of your vendor: If they block a larger list than this when claiming to block Tor, they're probably screwing up in less obvious ways too.) However, also keep in mind that people in some countries need to use Tor to avoid censorship—that's largely what it's designed for, after all—so keep a light touch here if you can, for humanitarian reasons.

Beyond Tor, there are a great many private VPN services, and there's a cat and mouse game between VPNs and VPN-detectors. It's not possible to do this perfectly, so on this front you shouldn't have very high expectations. And as IP addresses get reassigned, you're not only going to miss a bunch of geo-spoofing proxies, you're also going to block some legitimate traffic. You'll also block people who are using VPNs but just for privacy, without the intent of spoofing their location. I don't have much experience with VPNs, but my suspicion is that VPN services will tend to choose exit nodes near the customer for improved latency, meaning that a great many of these people will have their IP hidden, but will be in the same country as their exit node. They may be very confused when they get blocked by georestriction despite being in the "right" country. (You may wish to return distinct errors for "wrong country" and "proxy blocked", at the very least.)

Be prepared to deal with the customer service load. But not only that: Depending on your business, you may need to make a "back door" that will allow people access in spite of what your geo-IP or proxy-detection service claims.

Updates

2022-03-31: Go check out Adam Pritchard's "The perils of the “real” client IP", which by chance came out at the same time. It covers some of the same advice, but has a strong focus on rate-limiting and covers topics that I didn't get into or didn't think of—special challenges of IPv6 in rate-limiting, mishandling of multi-valued headers (both in proxies and in standard libraries), and actual examples of widely used software and services that do the wrong thing and make it harder to secure your service. (Akamai gets a special mention as being particularly badly behaved.) I knew the situation was bad, but this was eye-opening as to the wide variety of kinds of bad that are present in the IP determination and ratelimiting space.

The surprising complexity of interpreting X-Forwarded-For safely