View Single Post
Old 09-12-2007, 06:21 PM  
StarkReality
Confirmed User
 
StarkReality's Avatar
 
Join Date: May 2004
Location: 4 8 15 16 23 42
Posts: 4,444
Quote:
Originally Posted by Elli View Post
Yeah I don't want to stop valid bots from crawling, just the scrapers... sigh.
Scraping just means taking content from other sites and republishing it, some rewrite it to hide their tracks, some just copy it 1:1. They look for pages listed well for a keyword, in google for example, and send a bot to your page that copies your content and put up their own keyword optimized page this way. A blackhat version of content syndication...

In theory, it's easy to stop them, but the effort needed is often not really worth the results. You'd need to find the IPs they use(d) for crawling your sites in your logfiles and block them from access, but unfortunately, they often use anonymous proxies or are on dynamic IPs from big providers.

The best thing coming to my mind is reporting them to google, the page is cloaked, contains malware, well enough to get them kicked...but it will just take minutes and the stuff is up somewhere else again.
StarkReality is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote