GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   Bad Robots, site rippers (https://gfy.com/showthread.php?t=1095277)

Captcha 01-05-2013 04:45 AM

Bad Robots, site rippers
 
I am sick with bad robots, site rippers etc
Lots and lots of "noref" hits in ATX!......... I just launch a new site and get fucked with this shit again.... has anyone a list up to date like this one to block this shit?

http://www.javascriptkit.com/howto/htaccess13.shtml

JamesM 01-05-2013 04:55 AM

bots are smart these days, they use fake user agent.
anyways why are worried, is it because of bandwidth ? just curious.

popular sites seems to get this.

Barry-xlovecam 01-05-2013 07:07 AM

Longstanding problem
 
her are a few links I found that looked interesting
http://www.wizcrafts.net/exploited-s...blocklist.html

http://antiscraper.com/

From my experience it is ''wack-a-mole'' you need the time or manpower to grep the webserver logs looking for page requests ( or head request ) with no requests for images -- that is a dead give away. Then you have to be careful not to disallow legitimate search engine bots.

you can look for unusual activity ...

using SSH with permissions (in user or root depends on the log's location);

Quote:

tail -5000 access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail >tofilename
tail -5000 access.log | awk '{print $1}' | sort | uniq -c | sort -r | tail >tofilename
grep 'GET' access.log | cut -d' ' -f1 | sort | uniq -c | sort -r >tofilename
Welcome to the dark side ...



All times are GMT -7. The time now is 02:24 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123