GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   Googlebot... how arrogant can they be? (https://gfy.com/showthread.php?t=954928)

rowan 02-21-2010 12:07 PM

Googlebot... how arrogant can they be?
 
Google ignores crawl-delay in robots.txt. Instead, they force you to register your site in webmaster tools, so you can set a custom crawl rate. This is then RESET to the default after 90 days, so at that time you have to login again and change it again! How fucking arrogant is that? :321GFY

Other search engines respect crawl-delay, imagine if they all wanted us to create accounts and login every 90 days to stop their bots hammering our servers? :321GFY

Agent 488 02-21-2010 12:09 PM

yeah that damn google bot racking up bandwidth.

2012 02-21-2010 12:15 PM

http://www.charmr.com/images/itsyourworldboss.jpg

rowan 02-21-2010 12:16 PM

Quote:

Originally Posted by Agent 488 (Post 16878798)
yeah that damn google bot racking up bandwidth.

Server load, actually. They're hitting me 120k times a day.

rowan 02-21-2010 12:20 PM

And the point isn't so much what they're doing specifically to my site, more that they're arrogant enough to ignore (defacto?) robots.txt settings that every other major search engine bot respects. The webmaster dashboard robots.txt checker even helpfully points out that each of the crawl-delay lines in my robots.txt are ignored!

ColetteX 02-21-2010 12:23 PM

so google bots are ignoring crawl delay? i have delays as i am updating my sites, i dont want to see google every day on site that i am updating weekly. if crawl delay do not work, it is really sick cause 6 days from week google see my site as static not updated. oh snap

rowan 02-21-2010 12:28 PM

Quote:

Originally Posted by ColetteX (Post 16879016)
so google bots are ignoring crawl delay? i have delays as i am updating my sites, i dont want to see google every day on site that i am updating weekly. if crawl delay do not work, it is really sick cause 6 days from week google see my site as static not updated. oh snap

Crawl-delay is related to how fast a bot hits your site if it has multiple pages, not how long it will wait between refetching the same page.

http://www.google.com/support/webmas...n&answer=48620

My site has 200 million pages so technically googlebot isn't fetching fast enough... at the rate of 120k fetches per day it would take 4 1/2 years to index everything. At this point the benefit of indexing 100% of the site (or at least as much as it's trying to) isn't worth the load it's placing on the server.

ColetteX 02-21-2010 12:31 PM

Quote:

Originally Posted by rowan (Post 16879131)
Crawl-delay is related to how fast a bot hits your site if it has multiple pages, not how long it will wait between refetching the same page.

http://www.google.com/support/webmas...n&answer=48620

My site has 200 million pages so technically googlebot isn't fetching fast enough... at the rate of 120k fetches per day it would take 4 1/2 years to index everything. At this point the benefit of indexing 100% of the site (or at least as much as it's trying to) isn't worth the load it's placing on the server.

thank you man, there is still much to learn. bump for your answer

woj 02-21-2010 12:36 PM

User-agent: Googlebot
Disallow: /

problem solved :)

rowan 02-21-2010 12:38 PM

Quote:

Originally Posted by woj (Post 16879345)
User-agent: Googlebot
Disallow: /

problem solved :)

I was waiting for it. I'm surprised this "solution" wasn't posted sooner.

CunningStunt 02-21-2010 02:02 PM

Quote:

Originally Posted by woj (Post 16879345)
User-agent: Googlebot
Disallow: /

problem solved :)

:1orglaugh

To the OP - agreed. The trouble is, they can do what the hell they want with over 70% of the search market.

MrMaxwell 02-21-2010 04:32 PM

What kind of site has 200,000,000 pages?
That's alot of pages

Adam_M 02-21-2010 04:34 PM

Quote:

Originally Posted by woj (Post 16879345)
User-agent: Googlebot
Disallow: /

problem solved :)

I wish everyone would just do this!

Cyber Fucker 02-21-2010 04:55 PM

Quote:

Originally Posted by Adam_WildCash (Post 16881161)
I wish everyone would just do this!

Lol :1orglaugh

fatfoo 02-21-2010 04:57 PM

They can be very arrogant.

Waddymelon 02-21-2010 06:19 PM

Annoying isnt it? I've had googlebot hit my servers up to 15 times per second, for hours at a time. Dynamic pages really make googlebot go nuts.

u-Bob 02-21-2010 06:30 PM

They're evil, plain and simple....

StrokeKing 02-21-2010 07:30 PM

Quote:

Originally Posted by Waddymelon (Post 16881745)
Annoying isnt it? I've had googlebot hit my servers up to 15 times per second, for hours at a time. Dynamic pages really make googlebot go nuts.

15times in a second?! really annoying! well its a smart idea to disallow the googlebot.. thanks for the advice guys.

czarina 02-21-2010 07:38 PM

woj's solution is good if you want to keep googlebot off your site completely. But are you sure you want to do that?

epitome 02-21-2010 07:48 PM

Quote:

Originally Posted by MrMaxwell (Post 16881157)
What kind of site has 200,000,000 pages?
That's alot of pages

Google.

Rowan actually founded Google and he's frustrated because his baby is stuck in a loop.

Even as the founder, he cannot get support at Google and has to do the same thing as the rest of us.

Dirty Dane 02-21-2010 08:33 PM

Sue them.

rowan 02-21-2010 09:44 PM

Quote:

Originally Posted by epitome (Post 16881895)
Google.

Rowan actually founded Google and he's frustrated because his baby is stuck in a loop.

Even as the founder, he cannot get support at Google and has to do the same thing as the rest of us.

:1orglaugh

Yeah, I got pushed out by The Man! Fuck them! :1orglaugh

papill0n 02-21-2010 11:40 PM

Quote:

Originally Posted by StrokeKing (Post 16881871)
15times in a second?! really annoying! well its a smart idea to disallow the googlebot.. thanks for the advice guys.

:1orglaugh:1orglaugh:Oh crap:helpme:1orglaugh

MrMaxwell 02-22-2010 05:58 AM

What kind of site has 200,000,000 pages?

rowan 02-22-2010 06:35 AM

Quote:

Originally Posted by MrMaxwell (Post 16883341)
What kind of site has 200,000,000 pages?

Think of something on the net that there might be 200,000,000 of to profile. :error

raymor 02-22-2010 11:40 AM

Quote:

Originally Posted by rowan (Post 16879131)
My site has 200 million pages so technically googlebot isn't fetching fast enough... at the rate of 120k fetches per day it would take 4 1/2 years to index everything. At this point the benefit of indexing 100% of the site (or at least as much as it's trying to) isn't worth the load it's placing on the server.

200 MILLION pages? I'm curious, is that 200 million legitimate pages, or 200 million
pieces of fake SE spam crap? If you tried to spam the crap out of Google by creating
200 million bogus pages, I'd say you got what you deserved, and really what you
asked for. If you pretended to have 200 million pages so that Google would spider
you 200 million times, that was your decision. You can't blame Google if you chose to
create fake stuff for them to spider.

Note the repeated use of "IF" - I'm asking IF that's what you did.

tiger 02-22-2010 12:26 PM

Quote:

Originally Posted by woj (Post 16879345)
User-agent: Googlebot
Disallow: /

problem solved :)

:1orglaugh:1orglaugh

Yeah I was wondering when someone would post this.

Bottom line is, if its causing you more problems then its worth just block it. If they are hitting you that hard you should be getting some good traffic because of it, more traffic = more money, Just upgrade the servers.

DamnGoodRatio 02-22-2010 12:38 PM

I return a 503 page and they seem to respect that. Then I set it so they can crawl during my off peak loads which they seem to do.
Should work and I know this is documented somewhere on Google's FAQ just can't seem to find the link right now.

rowan 02-22-2010 03:07 PM

Quote:

Originally Posted by DamnGoodRatio (Post 16884427)
I return a 503 page and they seem to respect that. Then I set it so they can crawl during my off peak loads which they seem to do.
Should work and I know this is documented somewhere on Google's FAQ just can't seem to find the link right now.

I've noticed that a connection or server error will make G-bot back right off, but I'm concerned that doing this routinely might affect my rank. I've seen recent articles saying that site response time may become a factor in the future.

raymor: not useless spam, it's all genuine profiling of... domains. :error

rowan 03-03-2010 03:59 AM

February 22, 2010

New crawl rate: Custom rate

1.000 requests per second

1.000 seconds per request

This new crawl rate will stay in effect for 90 days.


Funny, 2 weeks later googlebot is still requesting 120k+ pages per day, which is about 150% the rate of the above setting.

Their webmaster tools system also sent me a notification encouraging me to increase the rate so they can fetch more pages. Looks like their bot is doing it anyway. :321GFY

seeandsee 03-03-2010 04:36 AM

bad bad google :)


All times are GMT -7. The time now is 10:48 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123