Quote:
|
Originally Posted by Big Ray
Like others, you make many assumptions on how this board works knowing what?s under the hood. It?s analogous to assuming an engine takes unleaded to find out that, opps, in fact it takes diesel.
No one said that your site is THE ONE that is causing the problems. What I am saying is that it contributes to the problem.
Seems everyone that spiders the site assumes they are the only one doing it. That computer resources are limitless and that what they are doing is ?small enough? not to cause a problem. Easy enough to crank up that spider a few more notches. Let?s see how far it can go!
The fact is that computer resources are not infinite and you are not the only one doing it. How many times an hour does your spider hit the site? A Minute? (we have logs).
When does a spider stop being a spider and become the source in a DoS attack. What?s the difference?
|
I understand the need to decrease the amount of traffic and server load used by spiders. What I was originally saying is that the 'content theft' excuse is not the right one. At the time of the actual crawling, the sites did not republish anything as far as you know. Since they use proxies, you have no idea who they are doing all that work for.