![]() |
DotBot from moz.com is not obeying robots.txt directives
It was pounding like crazy 1000s of my pages daily, I had not choice but to ban on webserver level.
Besides what good it does? It's just spying on your keywords and links. Well, even after ban it is still trying to access my sites. (I allowed only robots.txt for it. It reads it and does not give a fuck.) So their shit is obviously broken. What I’m trying to say is that, perhaps you should review your access logs from time to time and see what is wasting your bandwidth and server resources. For example, my other observation is that within last 3 years I have received from 500 000 to 1000 000 hacking and exploiting attempts per each live website. While all of them were not successful, it must have some impact on server performance. Well configured webserver can reduce all this bad stuff by 90-99%. |
What directive(s) was the bot ignoring?
|
Code:
User-agent: dotbot |
disallow all
allow google. or contact the dotbot guys |
Have you blocked it by IP? If so , has it come back under other IP's?
|
|
Nah, I've blocked it by user-agent header, I got no problem with it now since it’s blocked.
This thread was more like an educational one and a suggestion for everyone to monitor webserver logs, at least from time to time. I'm fine. :) Less resources eaten by bad stuff = more resources for good traffic = speed and speed = better SEO. |
Even "good" bots can cause issues.
At one point GoogleBot was fetching 150k+ pages per day from one of my sites. The site is heavily database driven so fetching two pages per second continuously did cause some server load issues. You can dial back the crawl rate in webmaster tools, but that setting expires after 90 days, and then Googlebot just starts pounding away again. They deliberately ignore the Crawl-Delay robots.txt directive. |
All times are GMT -7. The time now is 07:25 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123