View Single Post
Old 05-20-2005, 04:22 PM  
Big Ray
Confirmed User
 
Join Date: Dec 2003
Posts: 464
Quote:
Originally Posted by notjoe
Claiming I am the cause of the board's misfortune is complete and utter nonsense. Let explain why.

Yesterday GFY had a total of 8356 posts which I've counted/spidered. That?s 8.3k page views. Might sound like a lot right? But its noting in comparison to how many page views GFY gets on a daily basis.

I don?t now how other spiders work but I can say mine is probably the most gentle of them all. I don?t download entire threads I download single posts. What does that mean? Quite simply put it means that we download an avg of maybe 5000-15000 bytes per post, depending on the length of the content and not the bloated 100-200K threads which you would come across as a surfer.

Obviously the search I did isn?t a piggyback system. I don?t search using GFY's search function and alter results to look like my own thus don?t add load to in that aspect.

The threads I download have been posted within the last few minutes. This means that I am NOT causing disk seeks and if I am causing a seek I'm willing to be its only one seek, maybe two...i'm betting the majority of the information is cached within MySQL's query cache/table indexes.


You say "Board republishers and search engines are two different things." How is this any different? I spider urls, I download the content and I create a searchable index out of it. People search and results are generated. Those results are then ranked and displayed, that sounds like an SE to me.

You say SEs are ok but STB is bad, unlike the SEs we don?t download the same content over and over again. It downloads it one time and moves on never to be back to that post ever again. I highly doubt that this would/could cause the kind of

You claim that these "leechers" are the root of the problem but why do you think most people run a bot to begin with? It isn?t to add login time to their account its probably to monitor posts for keyword. so instead of having one or two people running a service and scanning threads you're having 200+ people running their own bot to monitor threads.. Now instead of 17k page views from me and boardt-racker (8.3k*2) you're getting 200x8.3k= 1660000 hits per day from usless bot clients. Doesn?t compute.

However, all that being said, it is still your board and I will respect your wishes.

Best regards,
Like others, you make many assumptions on how this board works knowing what?s under the hood. It?s analogous to assuming an engine takes unleaded to find out that, opps, in fact it takes diesel.

No one said that your site is THE ONE that is causing the problems. What I am saying is that it contributes to the problem.

Seems everyone that spiders the site assumes they are the only one doing it. That computer resources are limitless and that what they are doing is ?small enough? not to cause a problem. Easy enough to crank up that spider a few more notches. Let?s see how far it can go!

The fact is that computer resources are not infinite and you are not the only one doing it. How many times an hour does your spider hit the site? A Minute? (we have logs).

When does a spider stop being a spider and become the source in a DoS attack. What?s the difference?
__________________

Five of the Top 10 Largest Programs Host with Jupiter. Find out why at http://www.jupiterhosting.com
Big Ray is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote