Blocking Referer Spam


This article was written in May 2004, originally at my personal site, while the advice is still valid, it turns out that what I was seeing was not necessarily referer spam, but something a bit weirder which I need to write up in detail when I have time. Basically, my .htaccess information below is correct for what I was seeing, but may not work at all to solve strict referer spam problems. The problem I was seeing (and still do, though much less so) was that somehow my site ended up listed as an open proxy on some idiot's list. It isn't now, nor has ever been (I suspect it was due to an early PHP mistake on my part). So, the traffic I was calling "modified referer spam" was actually someone's attempt to fake traffic through affiliate sites by routing through my "open" proxy (and I imagine many others).

Anyway, read the rest for entertainment, perhaps enlightenment, but it's not necessarily correct. Check out Proposal for a solution to referrer spam: Using MT-Blacklist and other blacklists to filter spamming URLs for a better example than I provide here.

On a monthly, sometimes bi-weekly basis, I scan through my traffic logs and reports looking for things. Nothing in particular, oddities, things that stand out. I used to do this on a much larger scale at and it helped me in tuning the site as well as debugging problems within IBM's web space.

Late last year I noticed an increase in odd referrers. A referrer is (in theory) the URI which contains a link to a given site and is passed by the user agent (eg: Microsoft Internet Explorer, Mozilla Firefox) in the block of information sent to the target site when a resource is requested from that site.

Basically, that means that if you’re viewing and click on the link ed costello: articles and essays, a request like the following gets sent to my server:

GET /articles/ HTTP/1.1
...other stuff irrelevant to this post...

So, that is what a referrer is (referer is a tragic misspelling which occurred some time in the early days of the web and we're stuck with today in the CGI spec and other places).

Referrer spam is a recent innovation by the lower life forms which populate the web. I'd seen it in the logs, but infrequently and not on any regular scale. Now there are tools and web sites you can use to try to drive traffic to your site by spamming the referrer field on the target sites.

Referrer spam has become popular due to the am I cool or what need of various bloggers to show who's linking to their blogs. The easy way to do this has been to scrape the referrer field from the access logs or to capture them in real time using PHP or some other server side scripting language. Whatever way they're captured, they're then reposted to the site, sometimes ranked, usually linked.

The spammers rely on the fact that people will click on anything on a web site, even something that clearly says in bright letters DON'T CLICK HERE. Referrer spamming may also help increase a site's pagerank though I doubt that is that effective.

Whatever the cause, I'm now getting referrer spam. Of course, this is silly since I don't post referrers anywhere on any of my sites. Nowhere. In a country where you can get arrested, tried, and convicted simply for linking to content which someone has deemed illegal, reposting referrers just seemed like an easy invite for trouble.


Silly me, I thought that if I don't post referrers, I wouldn't get referrer spam.

Not only am I getting referrer spam, I'm getting what I now call modified referer spam: this consists of malformed proxy requests like the following:


Now, I don't run my site as an open proxy either, so this is just stupid, irritating, and a complete waste of my resources.

This referrer spam traffic provides no value to me at all, and if it grows could negatively impact whatever real traffic I do want to accept and respond to.

So, I'm fighting back.

My site uses the Apache web server, the following code bits are relevant only to Apache.

My first step was to block the IP addresses of the systems running whatever client application is available to generate referrer spam thusly:

deny from
deny from
deny from
deny from
deny from
deny from
deny from
deny from

The problem here is that this gets to be a pain to maintain, eventually the spammer gets a new IP address, or gets smart and uses AOL or some other large ISP for a run. Who’s going to block an entire ISP?

I realized that it would be easier to block by the patterns that the spammers use, as well as by the referrers being spammed. Since one pattern is to request a resource which isn’t on my server at all I check to see if the hostname matches my hostname. If the hostname doesn’t match, then I bounce the request. I’ve tried a couple different methods of bouncing can fail them entirely, serve up a nasty comment or two, or redirect the request.

A neat thing I discovered while I was running was this: when you redirect a request, the Referer does not get updated to reflect your site as the redirecting site, I found this was true for every web browser in popular use in the 1996-1998 timeframe and I believe it to be true today.

Case in point: in the 1996-1997 timeframe someone wrote a really stupid web crawler whose sole purpose for existence was to scrape email addresses from web pages. One night I watched the site monitors for and realized we were being attacked: something was driving a high volume of traffic to the site, and worse was causing a high volume of errors.

Doing some digging and tailing some of the logs, I realized that it was this stupid crawler. It had become trapped in the site, not handling a URL correctly and just generating ever more erroneous requests to the site. I did the only logical thing I could think of, since I wanted to get rid of the traffic (and the crawler was not stopping in response to 403, 404 or 500 errors), I added the URI in error to our redirect file, and targeted the redirect at the web site of the crawler in question.

The traffic immediately disappeared.

Taking this a step further, since we had all sorts of code patched into (the homepage itself was a CGI for a long time, probably far too long): I redirected all requests from the crawler (which happily supplied a user agent identifying itself and the company responsible for developing it) to the developer’s web site.

Anyway, based on that bit of history, that’s how I’ve responded on my own sites: redirect the traffic back to the spammers in question. I don’t want the traffic, I derive no financial benefit from receiving the traffic, I have no contractual obligation to accept the traffic. And I am breaking no laws that I know of in redirecting the traffic back to the originators.

So, without further adieu, here is the htaccess directives to do so, note that I’ve changed references to my site to

RewriteEngine On
RewriteCond %{HTTP_HOST} !^$ [NC]
RewriteCond %{HTTP_REFERER} ^(.*)$ [NC]
RewriteRule ^(.*)$ %1 [R=301,L]

There’s multiple variations on this of course, you could do:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^$ [NC]
RewriteCond %{REMOTE_ADDR} ^(.*)$ [NC]
RewriteRule ^(.*)$ http://%1 [R=301,L]

Which tells the client to redirect to itself or at least the IP address it is spoofing.

I could just fail the request, using RewriteRule ^(.*)$ $1 [F,L] but that seems self-defeating: if my server has to put up with the crap traffic to begin with I want someone else, preferably the bozo initiating it or paying for it, to feel some pain as well.

I strongly believe that the primary reason email spam and referer spam is so successful is that it’s so easy to do and carries so few penalties. If more sites reacted with strong defensive measures instead of just sucking up the additional traffic there would be less value to the spammers to do this sort of thing.

Note: This article was modified on 7 August 2004 to edit the URLs in the modified referer spam example to ''.

2004-12-09T20:03Z: I've turned comments back on...see if the spam bots attack again.

Posted in Link Spam

202: Accepted Archives

Feed icon We use Feedburner to distribute our web feeds: 202 Accepted Feed

feedburner graphic

Copyright 2002–2011 Artific Consulting LLC.

Unless otherwise noted, content is licensed for reuse under the Creative Commons Attribution-ShareAlike 3.0 License. Please read and understand the license before repurposing content from this site.