Defend Yourself!

| 2 Comments | No TrackBacks

As you know, it's not a matter of if the Robot Uprising will come, it a matter of when.

I know - I have been under constant robotic assualt for years, and I am finally starting to get sick of it.

If you have a weblog you know that there are teeming millions of webbots taht go around and try to put fake comments and track backs into weblogs. This leads to all of the nonsense comments that you sometimes see about viagra and online poker. Now, most of the junk comments and trackbacks get caught by the filtering software built into modern blogging software, so they don't actually show up on webpages. However the bots and the people who run them are too stupid to give up, so these visits to post this junk still shows up in my logs, and screws with my stats. I am also forced to moderate a bunch of the comments and trackbacks, which is irritating.

No more.

Today a new day begins as I rise up against the automated hordes. Okay, as I make the first step towards rising up. If you are on an apache webserver...and using a windows or Mac OSX desktop...and the only problem you have is tons of junk trackbacks...and you are tired of seeing them in your logs. Within those parameters I have the weapon that you can use to fight back.

Oh, also, you have to not mind blocking huge swaths of IP addresses.

I give you...Bad Robot! *stunned silence*

Bad robot is an application that analyzes your (apache) webserver logs and looks for signs that point to a Bad Robot, then spits out a report onscreen that you can review. You can then choose on an IP address by IP address basis whether ofr not to block that address from ever seeing your website again.

At the moment the only thing that bad robot really does is looks for anything that tries to post a track back to a movable type weblog. In the entire life of my weblog I have only had 2 external trackbacks, so for this initial version I am comfortable assuming that anyone trying to post a trackback is probably a Bad Robot. You might actually be a popular kid who get's trackbacks. In that case you will need to excercise some caution when blocking track back posters, so as not to block real people. Bad Robot! gives you some sorting options that can be helpful.

The neat thing is that now that I have put together the basic engine for examining logs it's really, really easy for me to add more rules to the program. Soon Bad Robot! will be able to distinguish between things are certainly bad robots, and things that might be people - and tell you! It will also be able to automatically figure out if a robot is obeying your robots.txt file. The goal here is to create a single tool that will handle misbehaving spiders, spam-bots, email harvesters, etc. in one fell swoop, then block them permanently.

Download the latest release and use them as you see fit. Tell me what you like about it, tell me what you hate about it, tell me what else you want it to analyze for, and tell me your stories of how you have struck out against the robot uprising.

I'm especially interested in hearing from mac users - the Mac version has never been tested in any way, since I don't have a Mac, so I am putting all my faith into the development environment's automated Mac version spitting out abilities. Can you people even open ZIP files?

I'll post new versions as make changes. Various Linux flavors may appear if there is any interest at all.

At this point you need to have the log file to analyze available locally, un-gzipped, and the output is a text file of the banned IPs that you have to manually add to the .htacess file on your webserver. So, it's still not for the faint of heart - messing with .htaccess can screw up your websever, so don't play with it if you don't know what it is.

You can still play with Bad Robot! though - at this point it can't possibly do anything bad. In the next version, where I plan to add auto-fetching of logs through FTP and automated updates to your .htaccess file (also via FTP) that could be dangerous. Really really dangerous. I will also create cute icons and stuff.

I wonder if JJ Abrams will sue me over the name...I hope so.

No TrackBacks

TrackBack URL: http://www.edgore.com/cgi-bin/mt/mt-tb.cgi/252

2 Comments

I had a terrible time with Robots adding comments to my blog and my wiki(only a real dick would mess with a man's private wiki) to the point where I basically adopted the Benito Mussilini communication model for a while. Now I've got keyword filtering, but that's not really workable either for the usual reasons.

I'm just wondering: What kind of chump actually makes this advertising practical?

> I will also create cute icons and stuff.

I'm thinking "Gir II: Terminate with Extreme Prejudice"

I'll never forget how Jean and I couldn't stop laughing after the end of the first episode of "Lost"

I believe that Jack Nicholson said in the first "Batman" movie "Never whack another man's Wiki" or something to that effect.

Send me log entries of bad behavior from you blog and wiki and I will add them to what Bad Robot! looks for. Soon you too can be blocking half of the IP addresses in the world!

No, really - when I ran Bad Robot! against my June logs, it came back with about 2,500 IP addresses that had tired to post trackbacks - and I had no valid trackbacks this months. I figure that I will add features to check for month-to-month violations fromt he same addresses in a near-future version to avoid blocking non-static IP addresses.

Leave a comment

OpenID accepted here Learn more about OpenID
Powered by Movable Type 5.04

About this Entry

This page contains a single entry by edgore published on June 27, 2006 4:14 PM.

Let the Buyer Beware... was the previous entry in this blog.

razlyubit' blyad is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.