Bot filtering in Apache

Some people filter robots, spiders, and web crawlers to only allow a few with robots.txt. I prefer to filter email collectors I know have a bad reputation. Since not every bot bothers to even check for robots.txt, I use the Limit restriction to ban certain bots from deekayen.net using their user agent string. The smart bot creators allow users to change the user agent string, so this method isn't foolproof, but it lets me sleep better at night.

This is what I put in .htaccess. There are probably better ways to write the regular expressions, but I haven't done a benchmark to know if lots of single line regular expressions are better in Apache or one big long string.

SetEnvIfNoCase User-Agent "Address" banned
SetEnvIfNoCase User-Agent "^anarchie" banned
SetEnvIfNoCase User-Agent "^almaden" banned
SetEnvIfNoCase User-Agent "^CherryPicker" banned
SetEnvIfNoCase User-Agent "^Chilkat" banned
SetEnvIfNoCase User-Agent "^Clushbot" banned
SetEnvIfNoCase User-Agent "^ContactBot" banned
SetEnvIfNoCase User-Agent "^crescent" banned
SetEnvIfNoCase User-Agent "^CydralSpider" banned
SetEnvIfNoCase User-Agent "^DBrowse" banned
SetEnvIfNoCase User-Agent "^Demo" banned
SetEnvIf User-Agent "^DYNAMIC$" banned
SetEnvIfNoCase User-Agent "^EBrowse" banned
SetEnvIfNoCase User-Agent "^eCatch" banned
SetEnvIfNoCase User-Agent "^EmailCollector" banned
SetEnvIfNoCase User-Agent "^EMAILsearcher$" banned
SetEnvIfNoCase User-Agent "^EmailSiphon" banned
SetEnvIfNoCase User-Agent "^EmailWolf" banned
SetEnvIfNoCase User-Agent "^exactseek-pagereaper-" banned
SetEnvIfNoCase User-Agent "^ExtractorPro" banned
SetEnvIfNoCase User-Agent "^Franklin" banned
SetEnvIfNoCase User-Agent "^Full" banned
SetEnvIfNoCase User-Agent "^Hatena" banned
SetEnvIfNoCase User-Agent "^InfociousBot" banned
SetEnvIfNoCase User-Agent "^IUPUI" banned
SetEnvIfNoCase User-Agent "LARBIN" banned
SetEnvIfNoCase User-Agent "^Lincoln" banned
SetEnvIfNoCase User-Agent "^Missauga" banned
SetEnvIfNoCase User-Agent "^Missouri" banned
SetEnvIfNoCase User-Agent "^Miva" banned
SetEnvIfNoCase User-Agent "^NaverBot_dloader" banned
SetEnvIfNoCase User-Agent "^NetCarta_WebMapper" banned
SetEnvIfNoCase User-Agent "^Netprospector" banned
SetEnvIfNoCase User-Agent "^nicebot" banned
SetEnvIfNoCase User-Agent "^NICErsPRO" banned
SetEnvIfNoCase User-Agent "^Nudelsalat" banned
SetEnvIfNoCase User-Agent "^Nutch" banned
SetEnvIfNoCase User-Agent "OASIS" banned
SetEnvIfNoCase User-Agent "^Pajaczek" banned
SetEnvIfNoCase User-Agent "^PeerFactor" banned
SetEnvIfNoCase User-Agent "^PEval" banned
SetEnvIfNoCase User-Agent "^Port" banned
SetEnvIfNoCase User-Agent "^Production" banned
SetEnvIfNoCase User-Agent "^Program" banned
SetEnvIfNoCase User-Agent "^ProWebWalker" banned
SetEnvIfNoCase User-Agent "^Relevare" banned
SetEnvIfNoCase User-Agent "Ripper" banned
SetEnvIfNoCase User-Agent "^SeznamBot" banned
SetEnvIfNoCase User-Agent "^sna" banned
SetEnvIfNoCase User-Agent "^SpiderMan$" banned
SetEnvIfNoCase User-Agent "^SquigglebotBot" banned
SetEnvIfNoCase User-Agent "Surf" banned
SetEnvIfNoCase User-Agent "^Tarantula" banned
SetEnvIfNoCase User-Agent "^Talkro" banned
SetEnvIfNoCase User-Agent "^TheInformant" banned
SetEnvIfNoCase User-Agent "^Thunderstone" banned
SetEnvIfNoCase User-Agent "^Under" banned
SetEnvIfNoCase User-Agent "^VengaBot" banned
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" banned
SetEnvIfNoCase User-Agent "^WebEnhancer" banned
SetEnvIfNoCase User-Agent "^WebMiner" banned
SetEnvIfNoCase User-Agent "^Wells" banned
SetEnvIfNoCase User-Agent "www4mail" banned
SetEnvIfNoCase User-Agent "^yoono" banned
SetEnvIfNoCase User-Agent "^ZoomInfo" banned

<Limit GET POST HEAD>
  order allow,deny
  allow from all
  deny from env=banned
</Limit>


» deekayen's blog · Printer-friendly version Topics:

Post new comment

  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <hr /> <a> <p> <em> <strong> <cite> <code> <blockquote> <ul> <ol> <li> <dl> <dt> <dd>
  • Web page addresses and e-mail addresses turn into links automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.