|
|
|
||||
|
Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums. You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us. |
![]() |
|
|||||||
| Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. |
|
|
Thread Tools |
|
|
#1 |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
Offline Browsers spidering and crashing apache.
I've got a few sites that get spidered frequently and an occasion rogue apache process will hang.. consuming cpu and memory, crashing mysql and shooting the server load up. I'll usually get 1 ip bounce all over the server grabbing pages from numerous domains.
Example Top output: 5 root 14 0 0 0 0 RW 0 68.7 0.0 63:48 kswapd 7156 www 18 0 446M 371M 56 R 0 63.6 42.2 12:41 apache 7158 www 14 0 434M 382M 56 R 0 58.2 43.5 12:26 apache Is there anything that will automatically monitor for those apache processes at least and kill them before they reach this point ? What about a mod_throttle solution or is that going overboard ? Suggestions ?
__________________
|
|
|
|
|
|
#2 |
|
Confirmed User
Join Date: Dec 2002
Location: CanaDUH
Posts: 5,125
|
Does your Apache actually die? And needs restarting?
|
|
|
|
|
|
#3 |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
Nope. It just keeps going and eats into the swap until it's all gone too.
__________________
|
|
|
|
|
|
#4 |
|
Confirmed User
Join Date: Dec 2002
Location: CanaDUH
Posts: 5,125
|
Sorry dude...if your apache was actually crashing and needed restarting, I know a solution. I imagine a programmer on here could write a script that will monitor your processes and kill the out of hand ones. It has happened to me before but I always killed them by hand....
|
|
|
|
|
|
#5 |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
What kind of solutions are others using ? :/
__________________
|
|
|
|
|
|
#6 |
|
Confirmed User
Join Date: Jan 2001
Location: World Traveler
Posts: 261
|
I think this guy might have what you are looking for:
http://www.tgpowners.com/members/ubb...ML/001437.html I asked about the same kind of thing. I have not tried this myself yet, but it's on my "todo" list.
__________________
Sorry, no signature today. Please come back tomorrow.
|
|
|
|
|
|
#7 |
|
Confirmed User
Join Date: Oct 2002
Location: Germany
Posts: 768
|
Apache is Open Source. Go to the website and check the knowledge base or post the question there. Always run the latest apache version that is not beta. You should do that anyway to avoid that virus that infects servers. I have not experienced this problem and I am running apache for years. Might be a problem with your OS and apache. I doubt offline browsers have something to do with it because all offline browsers do is make webrequests.
|
|
|
|
|
|
#8 |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
I'm trying everything I can think of :/
First thought was that maybe it's php or mysql, but it wasn't. It was apache just consuming memory and hangs or runaway processes. Turned ExtendedStatus on to try and pinpoint what pages the excessive memory processes were using. All were hitting a pretty simple php page. There were several different ones, and each seemed to be hooked to that same page and each process this time ( about 6 of them ) were at about 150M or more each. This is the ip # of the remote agent 66.147.154.3 and the useragent was this: http://www.almaden.ibm.com/cs/crawler/ What I'm wondering is why doesn't apache KILL the damn things when there's no memory left or they've hung for such a long time ?
__________________
|
|
|
|
|
|
#9 |
|
Registered User
Join Date: Feb 2003
Posts: 14
|
Here's my 2 cents,
I've seen this type of thing many times before, and it's always caused by something different. A few suspects do come to mind, the first being your php code. Check to see that there's no possible way for it to go into some kind of infinite loop. These usually happen when poorly written code encounters something it didn't expect. It could be doing something super retarded, like 'select * from bullshit_table'. Next on the list of suspects is your apache binary itself. You want to make sure that you compiled php and mysql in correctly. No outdated or broken libs or anything like that. Last on the list is the possibility that these crawlers are doing HTTP/1.1 connections. By doing that, they connect to one single apache process and just hammer away at it. That, coupled with a small memory leak could lead to what you are describing. The best way for you to debug this situation is to wait until you get one of these monsters and strace it. Just run strace -p <pid>, where pid would be 7156 in your example. This situation is usually pretty bad, because when a box starts to swap, the whole site will soon come crashing down. As far as process control, there are a few ways to do it, but they are all a pain in the ass. It's usually easier to fix the source of the problem. If you need any more help, ICQ me at: 348407599 hel0 |
|
|
|
|
|
#10 |
|
Confirmed User
Join Date: Feb 2002
Location: Free Speech Land
Posts: 9,484
|
Block all site known rippers using .htaccess to start with. It isn't 100% effective, but it will stop a lot of them.
As a temporary fix, if it's happening all the time, you could just set up a script to restart apache every 15 minutes. As far as the source of the problem, are you using any of the PHP accelerators or compression modules or anything like that? We've had memory problems with a few different things over the years. |
|
|
|
|
|
#11 | |
|
Confirmed User
Join Date: Dec 2002
Location: CanaDUH
Posts: 5,125
|
Quote:
|
|
|
|
|
|
|
#12 |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
what I meant by a simple php page was that the page contains practically nothing.. just a sitemap that parses the URL for PATH_INFO and uses the results in the Title of the page. No MySQL or anything.
I'm using mod_gzip, as far as compression goes. The crawler I mentioned earlier was using HTTP/1.1 requests. Apache has MaxRequestsPerChild 600. So the HTTP/1.1 connection combined with mod_gzip could be the issue, and they're just hammering away at the same simple php page repeatedly which could be consuming the memory until it finally crashes ? Isn't there something in Apache to get it to kill a process that consumes too much memory or hangs for too long ?
__________________
|
|
|
|
|
|
#13 |
|
Confirmed User
Join Date: Feb 2002
Location: Free Speech Land
Posts: 9,484
|
Where's that guy from XXXManager when we need him? He always has an anwer for this kind of stuff.
|
|
|
|
|
|
#14 |
|
Registered User
Join Date: Feb 2003
Posts: 14
|
Hi,
First, apache doesn't include a way to control runaway processes. You could do it with mod_perl, but that's a gigantic pain in the ass. Second, it could certainly be mod_gzip's fault here. Chances are that the crawler doesn't support mod_gzip, or even understand it. So you could be in some kind of content negotiation loop with the crawler. Give me a file. Do you support gzip? What? Give me a file. Do you support gzip? What? Give me a file. Do you support gzip? What? It's possible that because a request never actually happens, it's not incrementing the count. Strace the process the next time it happens, and you'll have the answer. If you have mod_perl, you can download Apache::GTopLimit or Apache::Watchdog::RunAway. Be warned, they are a major pain to install. You could also search google for shell scripts to check for huge procs and kill them. Just install it in cron and you're good to go. Hel0 ICQ: 348407599 |
|
|
|
|
|
#15 |
|
So Fucking Banned
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
|
Hi. I was called here by Mr Fiction
> 7156 www 18 0 446M 371M 56 R 0 63.6 42.2 12:41 apache: OUCH. 446M 371M? Goddamn. Anyway. Hel0 raised some interesting options but the possible list is long. Much longer than listed here. More info could certainly help identify the problem. First.. if what you want to stop access from specific remote hosts or IPs or something like that tell me. Second.. I'll try to respond to some points raised here: mod_gzip: Not likely the problem HTTP/1.1: Not likely the problem. I assume the one who mentioned it meant the keep-alive issue. Well - that is usually a helpful thing when answering requests from the same host. So its not the immediate cause in your case. Mod_throttle: Most likely wont help. strace: you will need to install it (not default on many/most/all? platforms) and know what it is. Don't go there if you don't know what it is. MaxRequestsPerChild: Probably has no meaning in your case even regarding HTTP/1.1, unless there is a gradual leak, which could be but I cant say how likely. Change this value to 10 just for the test. MaxRequestsPerChild has a crucial effect mainly if you use resident modules like mod_perl on a persistent connection base. if not you can set it to 1 or 10 like I suggested - for a while to see the effect. Actually I recommend turning off the keep-alive (state "KeepAlive Off" in httpd.conf) if you dont have graphics rich site. Again - you can do it for testing.. Restarting Apache every 15mins: Not nice. Question: 1. what is your MaxClients in httpd.conf? 2. assuming you still use keep-alive - what is your KeepAliveTimeout? 3. what is your MaxSpareServers? Since what you identified as "cause" of the crash can only be an irrelevant syptom - I suggest debugging the server.. Suggestion: (If you want to fix this for long term) 1. log the top into a file (top -b > /path/to/log/top.log) 2. log the memory usage and swap activity every 10 seconds(vmstat -n 10 > /path/to/log/vmstats.log) 3. Enable the apache logs so you can see the last requests before the hang (I assume you already did that by the info you gave) If anyone want/need to know anything I mentioned, ask. |
|
|
|
|
|
#16 | |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
First, thanks for all the input regarding this. I don't know if it'll happen again, since the ip has been blocked now. I've tried to duplicate it myself and succeeded to get the server load up to 30+ with 100+ concurrent processes spidering without apache or mysql crashing.
Quote:
.conf settings: Timeout 90 KeepAlive off KeepAliveTimeout 5 MinSpareServers 8 MaxSpareServers 16 StartServers 25 MaxClients 400 MaxRequestsPerChild 600 /server-status reported the same ip # / crawler each time connected to the processes that wouldn't die. Each request for the same page, with various different query strings appended that didn't exist. No mysql was involved in that connection.
__________________
|
|
|
|
|
|
|
#17 |
|
So Fucking Banned
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
|
KeepAlive is off - so its not http/1.1 - this one is cleared
As to this issue actually, i don't know what type of site you run but you might want to enable keepalive if its a graphocs rich site. Set the following Timeout 15 MaxClients 100 MaxClients set to 100 unless we are talking about one mother-fucking HUGE site. As to your report.. Server load of 30 is CRAZY!! Assuming your server has 1 CPU your server load is recommended to be less than 1. (2 CPUs - recommended less than 2 etc..) 30 is not good at all. It is only OK if the responses you get from the server are good enough for you (but they probably are not).l I suggest you check what is wrong with your server. Posting its specs here - as detailed as possible could be helpful. Or you can contact me if you want on ICQ. |
|
|
|
|
|
#18 |
|
Beer Money Baron
Industry Role:
Join Date: Jan 2001
Location: brujah / gmail
Posts: 22,157
|
My icq is 3431124, whats yours ?
__________________
|
|
|
|
|
|
#19 |
|
So Fucking Banned
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
|
9129246
|
|
|
|
|
|
#20 |
|
Confirmed User
Join Date: Oct 2001
Location: Somewhere in time
Posts: 143
|
well it definately smells like something else. Since keepalive is off the spider is almost out of the question, cause for each file it will request, it will end up on a different apache child. So there is no way he will hung up a process like that and eat all the ressources (apache is better coded than that - if a bot was doing this it would mean everybody could crash apache servers easily). Not to mention that apache will never take 400mb even if it serves big files...
Like hel0 said, look for a php script with an infinite loop (really looks like that).. Or it could also be a rewrite rule that loops infinitely, or a 404 loop (i.e. a 404 generates a 404 which generates a 404 and so on). The php script looks like the strongest possibility. Maybe the bot triggers a script/URL that you usually don't use, that could explain why it happens only when the spider comes by. |
|
|
|
|
|
#21 |
|
Confirmed User
Join Date: Oct 2001
Location: Somewhere in time
Posts: 143
|
btw from the ps/top output you posted, I assume you're running some flavour of linux? I don't know much about it (freebsd is what I know), but I'm pretty sure you can set some OS limits. On freebsd you can, you can tell the os to not allocate more than x memory to a process; I'm sure somebody can help you setting this up for linux..
|
|
|
|