Quote:
|
Originally Posted by Big Ray
Hey WG.
You make a good point, however, their is a big difference in the google and yahoo spider than the ones that try and pull 200 threads a second all day every day. What we have to contend with are aggressive screen scrapers that feel the need to index every post the second it's made. They are highly aggressive and are not throttled. They behavior more like a DoS attack than a bot.
People ask why we don't ban them. The answer is we try, however, the majority of them use anonymous proxies and I don't mean a few.. I am talking hundreds. When we block those, they get a new batch. It's a constant battle which we fight every day.
The search feature is widely used to attempt to create search notification systems. Some companies do this, and hit the site once a day to see what people are saying about them or "keyword". Others are constantly hitting search for keywords like "need content" or "need hosting". Ever wonder how the same folks always post first to a request for product or service thread? now ya know. lol
Ray
|
Wow, i had no idea they were pulling in data so frequently. I thought these feeds were just fetching your data daily or so. If they really are using hundreds of changeable proxies like this, I'd get ICS' attorneys involved at that point, a C&D should probably suffice. They obviously are not respecting your servers limits and circumventing your techniques to prevent their abuse. I really had no idea they were pulling data that frequently, that seems like an overkill where once per day suffices just as well.
WG