|
Wget, of course. wget can be set to ignore robots.txt, so a good thing to do is make a hidden directory full of crap, say /d/, and just fill it with a bunch of random data, and have your analyzer watch for that. There are many programs written for UNIX that can even realtime watch your apache logs and fire off an email with the person's IP (and if using normal authentication methods, their username).
|