Quote:
Originally Posted by Elli
Yeah I don't want to stop valid bots from crawling, just the scrapers... sigh.
|
Scraping just means taking content from other sites and republishing it, some rewrite it to hide their tracks, some just copy it 1:1. They look for pages listed well for a keyword, in google for example, and send a bot to your page that copies your content and put up their own keyword optimized page this way. A blackhat version of content syndication...
In theory, it's easy to stop them, but the effort needed is often not really worth the results. You'd need to find the IPs they use(d) for crawling your sites in your logfiles and block them from access, but unfortunately, they often use anonymous proxies or are on dynamic IPs from big providers.
The best thing coming to my mind is reporting them to google, the page is cloaked, contains malware, well enough to get them kicked...but it will just take minutes and the stuff is up somewhere else again.