Bot filtering in Apache
Some people filter robots, spiders, and web crawlers to only allow a few with robots.txt. I prefer to filter email collectors I know have a bad reputation. Since not every bot bothers to even check for robots.txt, I use the Limit restriction to ban certain bots from deekayen.net using their user agent string. The smart bot creators allow users to change the user agent string, so this method isn't foolproof, but it lets me sleep better at night.
This is what I put in .htaccess. There are probably better ways to write the regular expressions, but I haven't done a benchmark to know if lots of single line regular expressions are better in Apache or one big long string.
SetEnvIfNoCase User-Agent "Address" bannedSetEnvIfNoCase User-Agent "^anarchie" bannedSetEnvIfNoCase User-Agent "^almaden" bannedSetEnvIfNoCase User-Agent "^CherryPicker" bannedSetEnvIfNoCase User-Agent "^Chilkat" bannedSetEnvIfNoCase User-Agent "^Clushbot" bannedSetEnvIfNoCase User-Agent "^ContactBot" bannedSetEnvIfNoCase User-Agent "^crescent" bannedSetEnvIfNoCase User-Agent "^CydralSpider" bannedSetEnvIfNoCase User-Agent "^DBrowse" bannedSetEnvIfNoCase User-Agent "^Demo" bannedSetEnvIf User-Agent "^DYNAMIC$" bannedSetEnvIfNoCase User-Agent "^EBrowse" bannedSetEnvIfNoCase User-Agent "^eCatch" bannedSetEnvIfNoCase User-Agent "^EmailCollector" bannedSetEnvIfNoCase User-Agent "^EMAILsearcher$" bannedSetEnvIfNoCase User-Agent "^EmailSiphon" bannedSetEnvIfNoCase User-Agent "^EmailWolf" bannedSetEnvIfNoCase User-Agent "^exactseek-pagereaper-" bannedSetEnvIfNoCase User-Agent "^ExtractorPro" bannedSetEnvIfNoCase User-Agent "^Franklin" bannedSetEnvIfNoCase User-Agent "^Full" bannedSetEnvIfNoCase User-Agent "^Hatena" bannedSetEnvIfNoCase User-Agent "^InfociousBot" bannedSetEnvIfNoCase User-Agent "^IUPUI" bannedSetEnvIfNoCase User-Agent "LARBIN" bannedSetEnvIfNoCase User-Agent "^Lincoln" bannedSetEnvIfNoCase User-Agent "^Missauga" bannedSetEnvIfNoCase User-Agent "^Missouri" bannedSetEnvIfNoCase User-Agent "^Miva" bannedSetEnvIfNoCase User-Agent "^NaverBot_dloader" bannedSetEnvIfNoCase User-Agent "^NetCarta_WebMapper" bannedSetEnvIfNoCase User-Agent "^Netprospector" bannedSetEnvIfNoCase User-Agent "^nicebot" bannedSetEnvIfNoCase User-Agent "^NICErsPRO" bannedSetEnvIfNoCase User-Agent "^Nudelsalat" bannedSetEnvIfNoCase User-Agent "^Nutch" bannedSetEnvIfNoCase User-Agent "OASIS" bannedSetEnvIfNoCase User-Agent "^Pajaczek" bannedSetEnvIfNoCase User-Agent "^PeerFactor" bannedSetEnvIfNoCase User-Agent "^PEval" bannedSetEnvIfNoCase User-Agent "^Port" bannedSetEnvIfNoCase User-Agent "^Production" bannedSetEnvIfNoCase User-Agent "^Program" bannedSetEnvIfNoCase User-Agent "^ProWebWalker" bannedSetEnvIfNoCase User-Agent "^Relevare" bannedSetEnvIfNoCase User-Agent "Ripper" bannedSetEnvIfNoCase User-Agent "^SeznamBot" bannedSetEnvIfNoCase User-Agent "^sna" bannedSetEnvIfNoCase User-Agent "^SpiderMan$" bannedSetEnvIfNoCase User-Agent "^SquigglebotBot" bannedSetEnvIfNoCase User-Agent "Surf" bannedSetEnvIfNoCase User-Agent "^Tarantula" bannedSetEnvIfNoCase User-Agent "^Talkro" bannedSetEnvIfNoCase User-Agent "^TheInformant" bannedSetEnvIfNoCase User-Agent "^Thunderstone" bannedSetEnvIfNoCase User-Agent "^Under" bannedSetEnvIfNoCase User-Agent "^VengaBot" bannedSetEnvIfNoCase User-Agent "^WebEMailExtrac.*" bannedSetEnvIfNoCase User-Agent "^WebEnhancer" bannedSetEnvIfNoCase User-Agent "^WebMiner" bannedSetEnvIfNoCase User-Agent "^Wells" bannedSetEnvIfNoCase User-Agent "www4mail" bannedSetEnvIfNoCase User-Agent "^yoono" bannedSetEnvIfNoCase User-Agent "^ZoomInfo" banned<Limit GET POST HEAD> order allow,deny allow from all deny from env=banned</Limit>


Post new comment