Fight Spambot

never list your email

  • never print your email address in plain text/html
  • use javascript or use mywidget to hide your email from spambot

/robots.txt

dont use /robots.txt

dont list your form scripts in robots.txt coz bad bots will go after them.
if you dont want bots to crawl certain part of your sites, better use htpaswd
or, put one of this in your html meta tags:

<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET, NOARCHIVE">
# googlebot also understand http header
X-Robots-Tag: noindex
#http://www.google.com/support/webmasters/bin/topic.py?topic=8459
# http://help.yahoo.com/l/us/yahoo/search/webcrawler/index.html

use /robots.txt

trap spambot.
usually bad bots will intentionally follow any disallow in the robots.txt
use that disallow links to a script to add bad bots ip to your htaccess block list

http://www.kloth.net/internet/badbots.php
http://www.kloth.net/internet/bottrap.php
http://devin.com/sugarplum/
http://www.leekillough.com/robots.html

#example /robots.txt
User-agent: *
Disallow: /guestbook #guestbook is the most attacked folder. even if you never have it and never links it, spambot will try to access it
Disallow: /cgi-bin/guestbook.cgi #your guestbook.cgi should be replaced with spambot trap
Disallow: /emailaccounts #use scripts to poison spambot with thousands/millions of fake email addresses
Disallow: /anothertrap
Disallow: /moretrap

.htaccess

RewriteEngine On 

# Forbid requests for exploits & annoyances 
# Bad requests 
RewriteCond %{REQUEST_METHOD}!^(GET¦HEAD¦POST) [NC,OR] 
# CodeRed 
RewriteCond %{REQUEST_URI} ^/default\.(ida¦idq) [NC,OR] 
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR] 
# Email 
RewriteCond %{REQUEST_URI} (mail.?form¦form¦form.?mail¦mail¦mailto)\.(cgi¦exe¦pl)$ [NC,OR] 
# MSOffice 
RewriteCond %{REQUEST_URI} ^/(MSOffice¦_vti) [NC,OR] 
# Nimda 
RewriteCond %{REQUEST_URI} /(admin¦cmd¦httpodbc¦nsiislog¦root¦shell)\.(dll¦exe) [NC,OR] 
# Various 
RewriteCond %{REQUEST_URI} ^/(bin/¦cgi/¦cgi\-local/¦sumthin) [NC,OR] 
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR] 
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC] 
RewriteRule .* - [F] 

# Forbid if blank (or "-") Referer *and* UA 
RewriteCond %{HTTP_REFERER} ^-?$ 
RewriteCond %{HTTP_USER_AGENT} ^-?$ 
RewriteRule .* - [F] 

# Banning BOTS bellow 
# Address harvesters 
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider¦ExtractorPro) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect¦Harvest¦Magnet¦Reaper¦Siphon¦Sweeper¦Wolf) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} (DTS.?Agent¦Email.?Extrac) [NC,OR] 
RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR] 
# Download managers 
RewriteCond %{HTTP_USER_AGENT} ^(Alligator¦DA.?[0-9]¦DC\-Sakura¦Download.?(Demon¦Express¦Master¦Wonder)¦FileHound) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Flash¦Leech)Get [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Fresh¦Lightning¦Mass¦Real¦Smart¦Speed¦Star).?Download(er)? [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Gamespy¦Go!Zilla¦iGetter¦JetCar¦Net(Ants¦Pumper)¦SiteSnagger¦Teleport.?Pro¦WebReaper) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(My)?GetRight [NC,OR] 
# Image-grabbers 
RewriteCond %{HTTP_USER_AGENT} ^(AcoiRobot¦FlickBot¦webcollage) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Express¦Mister¦Web).?(Web¦Pix¦Image).?(Pictures¦Collector)? [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image.?(fetch¦Stripper¦Sucker) [NC,OR] 
# "Gray-hats" 
RewriteCond %{HTTP_USER_AGENT} ^(Atomz¦BlackWidow¦BlogBot¦EasyDL¦Marketwave¦Sqworm¦SurveyBot¦Webclipping\.com) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} (girafa\.com¦gossamer\-threads\.com¦grub\-client¦Netcraft¦Nutch) [NC,OR] 
# Site-grabbers 
RewriteCond %{HTTP_USER_AGENT} ^(eCatch¦(Get¦Super)Bot¦Kapere¦HTTrack¦JOC¦Offline¦UtilMind¦Xaldon) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web.?(Auto¦Cop¦dup¦Fetch¦Filter¦Gather¦Go¦Leach¦Mine¦Mirror¦Pix¦QL¦RACE¦Sauger) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web.?(site.?(eXtractor¦Quester)¦Snake¦ster¦Strip¦Suck¦vac¦walk¦Whacker¦ZIP) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} WebCapture [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR] 
# Tools 
RewriteCond %{HTTP_USER_AGENT} ^(curl¦Dart.?Communications¦Enfish¦htdig¦Java¦larbin) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} (FrontPage¦Indy.?Library¦RPT\-HTTPClient) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(libwww¦lwp¦PHP¦Python¦www\.thatrobotsite\.com¦webbandit¦Wget¦Zeus) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Microsoft¦MFC).(Data¦Internet¦URL¦WebDAV¦Foundation).(Access¦Explorer¦Control¦MiniRedir¦Class) [NC,OR] 
# Unknown 
RewriteCond %{HTTP_USER_AGENT} ^(Crawl_Application¦Lachesis¦Nutscrape) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse¦Eval¦Surf) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Demo¦Full.?Web¦Lite¦Production¦Franklin¦Missauga¦Missigua).?(Bot¦Locat) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net¦hhjhj@yahoo\.com¦lerly\.net¦mapfeatures\.net¦metacarta\.com) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Industry¦Internet¦IUFW¦Lincoln¦Missouri¦Program).?(Program¦Explore¦Web¦State¦College¦Shareware) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Mac¦Ram¦Educate¦WEP).?(Finder¦Search) [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa¦MSIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC] 
RewriteRule .* - [F] 

#!/usr/bin/perl -w 

$remreq = $ENV{REQUEST_URI}; 
$remaddr = $ENV{REMOTE_ADDR}; 
$usragnt = $ENV{HTTP_USER_AGENT} ¦¦ "The UA is blank"; 
$referer = $ENV{'HTTP_REFERER'} ¦¦ "there is no referer"; 
$date = scalar localtime(time); 
$remmeth = $ENV{REQUEST_METHOD}; 
$remhost = $ENV{'HTTP_HOST'}; 

open(MAIL, "¦/usr/sbin/sendmail -t") ¦¦ die "Content-type: text/text\n\nCan't open /usr/sbin/sendmail!"; 
print MAIL "To: ****\@yyy\.zzz\n"; 
print MAIL "From: xxx\@yyy\.zzz\n"; 
print MAIL "Subject: You caught another one!\n\n"; 
print MAIL "The following 'intruder' was caught by the \"Bot Trap\" and has been added to the ban env in .htaccess:\n\n"; 
print MAIL "The ip address: $remaddr was listed on $date \n"; 
print MAIL "The file requested was: $remreq\n"; 
print MAIL "The method used was: $remmeth\n"; 
print MAIL "The intruder's user agent was: $usragnt\n"; 
print MAIL "The document was referred by: $referer\n"; 
print MAIL "The Host Server is was $remhost\n"; 
close(MAIL); 
exit;
#http://www.webmasterworld.com/forum92/413.htm

secure form

spambots only know html and they do not proccess js/flash nor they have ocr to read from image.
use any combination of these.

  • use captcha
  • use js instead of pure html to write form (learn why people use js to write their email address)
  • use flash form
  • your form script should only proccess POST and die on GET
  • check referer. die if form is not POST or referer is not your own website.

http://www.javaworld.com/jw-06-1996/jw-06-javascript.html

<form method="POST" action="" onsubmit="proccessbyjs();"> 
// so, users will need to activate js to be able to post form
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License