View Single Post
Old 09-02-2017, 06:40 AM  
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,373
Quote:
Originally Posted by Barry-xlovecam View Post
It's really cat and mouse. UFW or iptables -- firewall them out -- if you have root. However, they will change IPs or AS networking so it is a never ending game.
I have a site that's scraped to hell and back. If you exclude Googlebot and all of the scrapers, there's probably less than 2% remaining (loads by a browser).

Over the years I've added bits and pieces to log various interesting information. The big red flag that sticks out, at least for my site: scrapers use proxies, so their IPs can change without notice, but the headers they send are usually a fixed pattern that is nothing like a real browser, so they're super easy to block.

Even a simple CAPTCHA that is triggered after say 10 loads without presenting a cookie manages to block most of them. Some IPs are constantly bashing at the site, day after day, even though they are almost perpetually 403'd or firewalled.

Guess there is a market for a service like this, if one doesn't exist... but integrating it into a customer's existing site would be interesting...
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote