View Single Post
Old 08-15-2002, 11:31 AM  
salsbury
Confirmed User
 
Join Date: Feb 2002
Location: Seattle
Posts: 1,070
um, won't internet explorer do this out of the box? offline exploring or something?

anyways, this is nothing new, although it is getting worse. we've had to set up some scripts that will temporarily block certain IPs if they spider too much content.

edit:

btw. it won't be long til there is such a program that can take advantage of a central list of open web proxies. (the programmers aren't stupid, so they already know this). a real solution is needed.

perhaps pre-pended unique IDs for each "session". the spider program would have to send the same unique ID for each hit. then you could throttle that unique ID. the only way to get the ID would be to go through the homepage (or some other special link).

for example, you go to http://www.cumoninn.com/index.html and you're redirected to http://www.cumoninn.com/0129738243/index.html

every link on that site (that stays within the site) would need that prepend number. so it'd be a "unique" session ID.

the only way around this would be to write the spider so it loads the home page before it loads any other link beyond it, to generate tons of unique IDs. maybe something could be created to block this too. anyways, it's a start, and this has been a silly brain dump.
__________________

Last edited by salsbury; 08-15-2002 at 11:37 AM..
salsbury is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote