View Single Post
Old 01-01-2005, 12:44 PM  
raymor
Confirmed User
 
Join Date: Oct 2002
Posts: 3,745
If you decide to block site rippers, there's a bad way to do it,
an OK way to do it, and a way good to do it.
The bad way is to have a .htaccess listing hundreds
of User-Agent strings that may be used by site rippers.
This is bad for two main reasons. #1, most site rippers
let the user change the User-Agent string. Many have
a simple checkbox to make it send the same user agent
as IE6. Though IE6 [B]IS/B] 3 years old already, a lot of
people still use it of course so you can't block it, which means
that you can't block site rippers which can so easily spoof
this user agent, which is most of them. Also your list will
never be complete, so may sites rippers would be unafacted.
Also, each of those user-agent lines is a conditon that
Apache has to evaluate for every single hit.
If the userloads a page with 30 thumbnails Apache has to
go loking through that list for each and every thumbnail,
doingthrousands of comparisons just to load one page.
Performance WILL suffer noticeably if you have any reasonable
amount of traffic whatsoever.

A slightly better way is to apply one of the cardinal rules of
security - disallow everything, then allow only what is OK.
Rather than listing hundreds of user agents (browsers)
that aren't allowed, you just list the 5 or 6 that ARE allowed -
IE, Mozilla and it's variations (including Firefox and Netscape),
Windows Media Player, RealPlayer, Opera, and Safari.
That takes care of the problem of keeping the list up to date
(until you get members using some other browser)
and solves the performace issues. It still leaves you wide open
to spoofing, though, and because it looks not at the problem,
site ripping, but only at the user agent, it's not 100% effective.
for example, IE has site rippin built right in! Add your site
as a favorite, then select off line viewing and you can use IE
itself as a site ripper. Howare you going to block that?
Well, you're going to block that by detecting and acting upon
the act of site ripping itself, rather than on the name of the software. That's the right way to do it, which brings us to option #3:

The best way, probably, is by using Strongbox. Besides having by
far the most sophisticated protection against brute force attack
and password trading, Strongbox also defends against site rippers
by actually detecting and stopping the ripping process -
the following of every link. Additionally it also provides an
enhanced version of method #2 where a user can only
access a page or image on the site by using the same
browser he or she logged in with in the first place.
These two defenses combined are much, much more
effective than naive attemptd based on listing the user-agent
headers sent by known rippers.

For more info in Strongbox, see:
http://www.bettercgi.com/strongbox/
__________________
For historical display only. This information is not current:
support@bettercgi.com ICQ 7208627
Strongbox - The next generation in site security
Throttlebox - The next generation in bandwidth control
Clonebox - Backup and disaster recovery on steroids
raymor is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote