Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 01-29-2003, 09:44 AM   #1
HQ
Confirmed User
 
Join Date: Jan 2001
Posts: 3,539
bot-blocking with .htaccess

I want to block bots with an .htaccess file. After a quick search on the net I came accross this list:

Quote:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon
RewriteRule /*$ http://www.site-you-are-sending-the-bot-to.com [L,R]
My question is, will a .htaccess file this size slow my server down (much)?

(Also, if anyone finds any errors in the above code, let me know as it is not my code, I grabbed it from a quick search for .htaccess and teleport (a bot that hammered my server once):

http://www.google.com/search?q=.htaccess+teleport
HQ is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 09:46 AM   #2
DarkJedi
No Refunds Issued.
 
DarkJedi's Avatar
 
Industry Role:
Join Date: Feb 2001
Location: GFY
Posts: 28,300
it will put your server to its knees
DarkJedi is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 09:55 AM   #3
HQ
Confirmed User
 
Join Date: Jan 2001
Posts: 3,539
How can you block individual bots and individual IPs without checking every single hit against the blacklist? Impossible.
HQ is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 09:58 AM   #4
DarkJedi
No Refunds Issued.
 
DarkJedi's Avatar
 
Industry Role:
Join Date: Feb 2001
Location: GFY
Posts: 28,300
i know man, noref bots suck
DarkJedi is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 10:42 AM   #5
scooby doo as scooby does
Confirmed User
 
Join Date: Nov 2002
Location: A deep dark place.
Posts: 314
If you have a dedicated, install mod_throttle.
__________________
In 1904, Charles Newman-Berry connected two abacus's together using specially enhanced GrapeVine thus inventing the first Internet connection.

NEWMAN-BERRY CASH
Paying webmaster since 1904
scooby doo as scooby does is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 10:47 AM   #6
ZoiNk
Confirmed User
 
Join Date: Feb 2002
Location: Canada
Posts: 2,370
Also add the deny to your robots.txt file. Some of them by default will follow that, including some of the leaching bots. (But you can normially change the settings in options to turn that off). Will stop a few without any lag on your server.
ZoiNk
__________________
"People can have the Model T in any color - so long as it's black." - Henry Ford
ZoiNk is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 10:50 AM   #7
HQ
Confirmed User
 
Join Date: Jan 2001
Posts: 3,539
Show me an example of a robots.txt file that denies bots (but allows search engine bots), please.
HQ is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 11:08 AM   #8
-=HOAX=-
Confirmed User
 
Join Date: Dec 2001
Location: CrackYaMental
Posts: 4,365
I've heard that some se spiders show up as a site sucker or whatever type bot and can be thus blocked by long rewrite lists like that...which is kinda a bad thing...anyone know if this is true?
__________________
Insert Value Here.
-=HOAX=- is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 11:27 AM   #9
jimmyf
OU812
 
Join Date: Feb 2001
Location: California
Posts: 12,651
Quote:
Originally posted by DarkJedi
it will put your server to its knees
how about if you just put 3 or 4 of the most used ones and does anyone know the 3 or 4 most used ones?
__________________
Epic CashEpic Cash works for me
Solar Cash Paysite Plugin
Gallery of the day freesites,POTD,Gallery generator with free hosting
jimmyf is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 11:31 AM   #10
DarkJedi
No Refunds Issued.
 
DarkJedi's Avatar
 
Industry Role:
Join Date: Feb 2001
Location: GFY
Posts: 28,300
Quote:
Originally posted by jimmyf
how about if you just put 3 or 4 of the most used ones and does anyone know the 3 or 4 most used ones?
yeah thats what i do.
i put teleport pro there and a couple others
DarkJedi is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 11:32 AM   #11
jimmyf
OU812
 
Join Date: Feb 2001
Location: California
Posts: 12,651
Quote:
Originally posted by DarkJedi


yeah thats what i do.
i put teleport pro there and a couple others
thanks I was pretty sure about teleport..
__________________
Epic CashEpic Cash works for me
Solar Cash Paysite Plugin
Gallery of the day freesites,POTD,Gallery generator with free hosting
jimmyf is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-29-2003, 11:56 AM   #12
Ludedude
Suck it!
 
Industry Role:
Join Date: Jun 2001
Location: Who wants to know?
Posts: 4,432
Here are a few of the "top bots" from my server logs listed in order of number of accesses.

Download Ninja 7.0
Internet Ninja 6.0
Internet Ninja 5.0
Teleport Pro/1.29
FlashGet
Internet Ninja 4.0
Teleport Pro/1.29.1590
WebLeacher 2.1
WebCopier v3.3


__________________
Ludedude is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-30-2003, 07:37 AM   #13
HQ
Confirmed User
 
Join Date: Jan 2001
Posts: 3,539
Quote:
Originally posted by -=HOAX=-
I've heard that some se spiders show up as a site sucker or whatever type bot and can be thus blocked by long rewrite lists like that...which is kinda a bad thing...anyone know if this is true?
Anyone know anything about this?
HQ is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 01-30-2003, 08:02 AM   #14
gornyhuy
Chafed.
 
gornyhuy's Avatar
 
Join Date: May 2002
Location: Face Down in Pussy
Posts: 18,041
To my knowledge, all the best leech bots have the option to just set their ref ID and other details to fake being a regular browser.

Good luck blocking that shit.
__________________

icq:159548293
gornyhuy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.