|   |   |   | ||||
| Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. | 
|    | 
| 
 | |||||||
| Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. | 
|  | Thread Tools | 
|  11-26-2019, 12:19 AM | #1 | 
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | 
				
				Y'all ever had problems with google crawl rate?
			 My sites have millions of pages because they're tube aggregators and CJ tubes Google...google will ruin my servers hitting them hundreds of thousands of times a day (times many sites). It's a serious problem because their crawler doesn't follow normal caching patterns...the average user will hit my front page or a page that ranks, make a common search query, and click on the same video the last 100 users did. Everything served from redis, no problem. Cheap. Google crawls queries users never make, they hit videos users never click on...their crawler never hits the cache. They account for like 80% of my database load because they never. hit. the cache. For years I just used the search console to slow their crawl rate. They have never respected crawl-delay in robots.txt. Lately it's even worse--with the new console I can't set their crawl rate limit anymore. I've had to block them from parts of my sites just to keep my shit running. Driving me nuts. Anyone struggled with this? Any tips? | 
|   |           | 
|  11-26-2019, 12:25 AM | #2 | 
| Retired Industry Role:  Join Date: Dec 2002 
					Posts: 21,403
				 | Googlebot has taken down my servers before when I was running stuff I don't talk about. Pretty much same scenario as yours, a few dozen domains and it was crawling pages like 40 - 50 per second for 3 days straight. Running tail -f on the logs and shit was just flying off the screen. | 
|   |           | 
|  11-26-2019, 12:28 AM | #3 | |
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | Quote: 
 tail -f <nginx log path> | grep -i 'google' HOLY FUCKING SHIT | |
|   |           | 
|  11-26-2019, 12:34 AM | #4 | |
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | Quote: 
 | |
|   |           | 
|  11-26-2019, 04:48 AM | #5 | 
| Too lazy to set a custom title Join Date: Mar 2002 Location: Australia 
					Posts: 17,393
				 | I've had the same problem. Basically, Google are arrogant cunts that refuse to follow the "non-standard" Crawl-Rate robots.txt directive, even though it's a de-facto standard, and a pretty clear indication by the webmaster that a crawler should slow down. Since Google ignores such a directive, you have to log into webmaster tools to manually configure the rate to something lower. Furthering their arrogance, the setting expires it after 90 days, reverting back to normal behaviour. You have to log in and manually configure the crawl rate again to stop them beating the shit out of your server. Fuck Google.  | 
|   |           | 
|  11-26-2019, 04:52 AM | #6 | 
| Industry Role:  Join Date: Aug 2006 Location: Little Vienna 
					Posts: 32,235
				 | I had once also agreggator tube server taken down by semrush bot. All those crawling bots are really badly configured. | 
|   |           | 
|  11-26-2019, 04:59 AM | #7 | |
| Too lazy to set a custom title Join Date: Mar 2002 Location: Australia 
					Posts: 17,393
				 | Quote: 
 So if url1 redirects to disallowed url2, they'll still load url2... even though robots.txt asks them not to. They helpfully suggested that I should just disallow url1 as well as url2. | |
|   |           | 
|  11-26-2019, 05:01 AM | #8 | |
| Industry Role:  Join Date: Aug 2006 Location: Little Vienna 
					Posts: 32,235
				 | Quote: 
 | |
|   |           | 
|  11-26-2019, 05:07 AM | #9 | 
| Too lazy to set a custom title Join Date: Mar 2002 Location: Australia 
					Posts: 17,393
				 | |
|   |           | 
|  11-26-2019, 05:23 AM | #10 | 
| Too lazy to set a custom title Join Date: Mar 2002 Location: Australia 
					Posts: 17,393
				 | Okay, I just realised I didn't read the OP properly. You can no longer limit the crawl rate in the webmaster console? Hmmm... | 
|   |           | 
|  11-26-2019, 02:22 PM | #11 | |
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | Quote: 
 Has anyone else been able to find it since the new console was released? I've looked everywhere, checked on search engines, BHW, GFY...no one is talking about it, I suppose because most people's sites don't exactly have millions of pages that all require cpu-intensive DB queries. Google really are arrogant cunts--they don't respect crawl-delay and appear to have removed all ability to tell them to slow the fuck down. | |
|   |           | 
|  11-26-2019, 02:33 PM | #12 | 
| Living The Dream Industry Role:  Join Date: Jun 2009 Location: Inside a Monitor 
					Posts: 19,631
				 | I hate Google, and it's getting worse. Checked your GMail lately? The damn inbox loads 4 times (by my count) before displaying and there is always a few seconds delay when accessing emails. It's like Big G is copying this, relaying that, storing this bit of data, crawling your ass for that....it's like digital rape.  
				__________________ My Affiliate Programs: Porn Nerd Cash | Porn Showcase | Aggressive Gold Over 90 paysites to promote! Now on Teams: peabodymedia | 
|   |           | 
|  11-26-2019, 02:57 PM | #13 | 
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | I'm always on board with "adapt or die" but I've lost income over the last few months and had to change my current and future strategy because I can no longer have a site with millions of crawlable pages (and have had to block google from crawling pages that I could previously let them crawl at an acceptable rate--so of course now they're out of SERPs and I lost that traffic) Like how ridiculous is it that now when I build and operate sites I literally have to make sure there's not too many crawlable pages, or else google will molest my servers to death. And there's literally no way to tell them to stop, other than to forcibly block them. when I started in this industry my entire business model and strategy was "make sites with millions of crawlable embedded videos and hope a few thousand of them rank." Such a strategy is now borderline unviable. | 
|   |           | 
|  11-26-2019, 07:15 PM | #14 | |
| Too lazy to set a custom title Join Date: Mar 2002 Location: Australia 
					Posts: 17,393
				 | Quote: 
 According to this page, Google sees 429 as being the same as 503 Service Unavailable. https://www.seroundtable.com/google-...ode-20410.html | |
|   |           | 
|  11-26-2019, 10:56 PM | #15 | |
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | Quote: 
 But shouldn't I just be able to easily tell them, "don't hit my site more than 100k times per day" and they should respect that? | |
|   |           | 
|  11-26-2019, 11:28 PM | #16 | |
| Too lazy to set a custom title Join Date: Mar 2002 Location: Australia 
					Posts: 17,393
				 | Quote: 
 Google seems to be following the letter of the law, rather than the spirit, which doesn't always work so well on the internet. | |
|   |           | 
|  11-27-2019, 02:24 AM | #17 | |
| Confirmed User Industry Role:  Join Date: Aug 2015 
					Posts: 1,018
				 | Quote: 
 hit me more than once in any given second? 429. I can't see how it could hurt SEO...you're literally telling them, "you're hitting me too much, slow down". how could they penalize you for that? yeah there might be less-important URLs they might not hit but that was always a factor back when we could use search console to slow them down. 404'ing them or blocking them from urls with robots.txt seems stupid in comparison. | |
|   |           |