Killing referrer spam
I’ve been watching Technorati on the subject of referrer spam the last few days. The blogger Ann Elisabeth has been doing excellent work ferreting out where all this is coming from, and I do recommend you go see—but that won’t so much help you stop it.
If you don’t know what referrer spam is? Ignore this post entirely or pass it on to a knowledgeable friend. But what the hey, for the rest of you, here’s what I do. Not a silver bullet—takes work—but definitely a bandwidth-reducer.
You will need:
- Your webhost to be running the Apache webserver (not IIS).
- FTP access to your server, an FTP client, and the skill to use it.
- A text editor. Notepad actually will do this time. Microsoft Word won’t.
- Some patience.
If you have access to your server logs, “recent visitor” logs, or a log-analyzer (like Analog or AWStats), that will help a lot. I will also be discussing some WordPress-specific tricks; I’ll mark them as such.
What we’re going to be doing is messing with .htaccess files. These files tell Apache (among other things) who is allowed to see a particular part of your website and how to rewrite and redirect URLs when that’s necessary. If you use WordPress and you have pretty permalinks, you’ve already messed with .htaccess, because that’s what’s making the pretty permalinks work.
BE AWARE: YOU CAN BORK YOUR WEBSITE WITH THIS. I’ve done it. (In fact, I did it two minutes ago. Go me.) How will you know your .htaccess file is borking your site? Well, usually, when you browse to your weblog’s URL you’ll get a “500 Internal Server Error” page of some sort instead of your beloved weblog.
Always, always, always keep a last-known-good version of your .htaccess file! If you’re using FTP to place your .htaccess file and you bork your site, you just upload the last-known-good file, and you’re golden.
If, on the other hand, you’re using WordPress’s Templates menu to mess with your .htaccess file and you bork your site, you probably can’t use WordPress to fix it! So what you do (well, what I do) is fire up the FTP program, grab the malfunctioning file, fix it, and re-upload it. WordPress will then behave normally. DANGER WILL ROBINSON! Don’t use WordPress’s Templates menu for this unless you’re reasonably confident you know what you’re doing, or at least can fix whatever you mess up!
Right. That said (and to it I add: don’t sue me if you bork your site; you take my advice at your own risk), onwards.
Your first decision is where to put your .htaccess file. Typically, it should go as high up in your web-folder hierarchy as possible, because it should then protect all the subfolders underneath. However, if you’re a WordPresser and your WordPress install is in a subfolder, go ahead and use the existing .htaccess file, or if you don’t have one, put one in along with your index.php file.
If you use a subdomain for your blog (as I do; the difference between http://cavlec.yarinareth.net/ and http://www.yarinareth.net/caveatlector/), you have to put your .htaccess file in a directory belonging to the subdomain. (At least on my webhost you do.) This is annoying, because if you maintain more than one WordPress blog on more than one subdomain, you have to edit .htaccess separately for every single subdomain. I haven’t found a workaround for this. Yet. If anyone has one, please let me know!
Okay, now that you know where your towel .htaccess file is, what do you put in it? At the top, you need the following two lines:
RewriteEngine On RewriteBase /
If they’re already there, great; leave them be. This tells Apache that you’ll be making some rules about URLs on your site.
I now recommend that you pick up one of the tricks from Mr. Costello:
RewriteCond %{HTTP_HOST} !^yarinareth.net$ [NC]
RewriteCond %{HTTP_REFERER} ^(.*)$ [NC]
RewriteRule ^(.*)$ %1 [R=301,L]
Replace my website domain name in the first line above with yours. (No. Really. Do it. If you don’t, YOU WILL BORK YOUR SITE.) This won’t kill all referrer spam by a long shot, but it’ll kill the really stupid ones. What it does is, if the stupid referrer spammer asks for a page that isn’t even part of your site (as a few of ’em actually do!), Apache silently tells them to go to the page they’re trying to referrer-spam you with! Cute, no?
Next we’re going to deal with referrer spammers smart enough not to screw up in this fashion. And to make our lives easier when new referrer spammers come along, we’re going to use a bit of indirection. First, we’ll make a list of words or word-fragments that show up in fake referrers, and we tell Apache that they’re bad. Then, we’ll tell Apache not to give anything to anybody who shows up with a referrer containing one of the words or word-fragments we’ve defined as bad. Make sense? Good.
Here’s my current list, which you are welcome to copy and paste — just get rid of all hard returns so that the entire thing is one line. Apologies for the bad language below; not precisely my fault!
SetEnvIfNoCase Referer ".*(credit|canadianlabels|8gold|texas-hold|hold-em|holdem| fidelityfunding|condo|sportsparent|mortgage|spoodles|money| cash|hotel|houseofseven|stmaryonline|newtruths|popwow|oiline| flafeber|thatwhichis|tmsathai|pisoc|crepesuzette|mediavisor| commerce|easymoney|911|////.vi|gb////.com|4free|macsurfer|teen| pussy|discount|blogincome|lillystar|aizzo|webdevsquare|laser-eye| escal8|xopy|vixen1|linkerdome|youradulthosting|fick|inkjet-toner| fuck|ime.nu|perfume-cologne|italiancharmsbracelets|shoesdiscount| psnarones|hasfun|casino|gambling|poker|porn|sex|paris|gabriola|nude| xxx|hilton|pics|video|adminshop|devaddict|iaea|empathica| insuranceinfo|atelebanon|handy-sms|peng|just-deals|pisx|rimpim).*" BadReferrer
(Wow. I’ve got quite a few of ’em, haven’t I? Well, I’ve been doing this a while.)
Next, add the following lines:
order deny,allow deny from env=BadReferrer
Be careful to stifle your automatic tendency to put a space after a comma in the first line above. THAT WILL BORK YOUR SITE. Seriously. Apache is unforgiving.
And that’s that. Any request for a page containing a referring URL with any of the words separated by pipe (|) characters above is going to get smacked down.
What happens when the new batch of referrer spammers hits? Let’s say somebody’s desperately trying to insert example.com into your server logs. All you do is add example| to the beginning of your string o’ bad words, and you’re golden.
What, you say, I don’t have to enter the whole URL? No, no you don’t. And in fact you probably don’t want to, because entering “mortgage” just once blocks every single stupid mortgage-shark referrer-spammer coming down the pike.
How do you tell it’s working? Check your server logs (however you do that) for HTTP code 403, which means “forbidden” and is what Apache should do to these yobbos. But to use a broader brush, you should also see your daily bandwidth drop significantly if these guys have been hitting you.
I’ve created a new category “Spam” which I intend to use to pass on additions to my personal list of bad words, as well as any other good tips I see and decide to employ. I don’t want to become a blacklist clearinghouse, so if anybody else wants to take on this job, please please be my guest.
I’ve got a few other techniques to pass on, but they’re strictly for the serious anti-blog-spammer, so I’ll close this post for now, hoping it does some good.