![]() |
defunct processes
I have a lot of < defunct > processes on my server. Can someone please tell me everything they know about defunct processes?
I'll start: http://www.linux-tutorial.info/cgi-b...8&9998&115&0&3 Quote:
Also, is it possible that I have so many defunct processes because I have so many scripts running on each hit in and click out on my websites, and everything is actually ok? |
I have also been told:
Quote:
Defunct processes need to be explained to me so I can solve this mystery. I have no idea if everything is ok, or if everything is fucked... |
My thoughts:
Defunct processes do not use up RAM or CPU but they do take up a spot in the process table. There are only so many slots in this table and if they are all used up by 'zombie' processes, then this could cause the ignoring of new processes. This could be a real problem. Am I close to right? I still have no idea if all the defunct processes on my box are 'normal' or caused by a coding bug or some other mistake... :( |
Ok, I have also learned that some defunct processes just hang around until Apache is restarted. I just wanted to point out that this is not the case with my defunct processes. There "TIME" in top is always 0:00. (I am assuming the "TIME" goes up higher than this for a defunct process that sticks around forever.)
|
Generally, defunct processes, or zombies, are caused by a parent process not reaping its children. This happens when a child dies without telling its parent, who continues to wait for that notice. And, yes, they are a bad thing- they can slow your system to a complete standstill.
Which process is the parent process of all those zombies (type "ps -e", without the quotes, on a telnet command line)? |
Make that "ps -el", you want the long listing to get the PID of the parent. That'll be in the column labelled "PPID", reference column "PID" for the match- that'll be the parent. OK?
|
Quote:
SpaceAce |
Quote:
The PPID's of the defuncts (when I check them against the PID column) end up being "httpd" processes, as expected as they are all caused by webserver hits (and some via cronjobs). So this is where I get lost with the parent/child thing... Please explain what how this works when there is no parent. |
Quote:
|
You HTTP daemon is leaving wayward processes behind? That's pretty odd. Are you running any background processes like Eggdrop, a proxy server, etc?
What version of which server are you using and are you loading any non-standard modules? Did you change anything in the config file or was it all set up by your host? Maybe some logging processes or security features? There's about a billion possibilities. Can you pinpoint when one of them shows up? SpaceAce |
Well, the problem with zombies is that they are resource hogs that give nothing back in return. Interprocess communication is done with shared memory and there's only so much of it set aside for this. A parent allocates this space for the child before forking it and is responsible for deallocating it after the child dies. Whenever the parent exits without telling the child, the child cannot die because this resource remains tied up- hence the "zombie" name. You can't even kill a zombie directly, you have to kill the parent (PPID) of the zombie. That may seem strange, to kill a process that has already exited, but what you actually end up doing is signalling it's children that the parent is dead, so they can go ahead and exit themselves.
If it happens often enough, all the slots in the area become filled and it's impossible to fork a new process. The good news is, it takes a hell of a lot of zombies to do this. The bad news is, sooner or later that's exactly what you'll have if you don't either cure the problem or start flushing them out regularly. Anyway, here are the most likely causes of this on a web server that I can think of. Solution varies as to cause. First of all, check the root crontab, check the /etc/inittab, /etc/rc.d/* files, /etc/inetd.conf to make sure all is in order. Better yet, have someone else check them that knows what they're doing but isn't that close to the specific docs. :) When you ran "ps -el", the status line (S) for defunct processes was a "Z" (for Zombie). Correct? Do any processes show as a "T" (for sTopped)? Do you have an oracle database (DBD 0.47 especially!) in use with PHP? Do any of your perl or cron scripts contain interactive commands? This would be something along the lines of executing the "htpasswd" program, which expects a repeated password. If there are no interactive commands, are you checking returns CORRECTLY on ALL system calls? Are you using fast-CGI? Are you using CGI-wrap? |
BUMP
:BangBang: |
Thanks for the extra input. I do not have time right now to check into this, but within the hour I'll get back to you guys with more details. Thanks.
|
huh
|
Quote:
My versions are Red Hat Linux 7.2 and Apache version 1.3.22. I did not change any config files myself except for adding in new virtual servers and enabling SSI and CGI. My only guess is my own coding errors in my scripts as they are the only processes going defunct, but in the same breath they are the only scripts running (my own). |
Quote:
|
Quote:
|
Quote:
|
* bump *
|
.: bump :.
|
No one has any other input on this? If you can help in any way possible, it would be greatly appreciated.
|
Ok, do this:
ps ax | grep defunct concentrate on the pids - left side. Which of these two scenarios is the case: 1) The pids are staying the same. 2) The pid list is different from each listing If the answer is 2... Do you get a ton of hits to your server? If that is the case, this is normal. I host a ton of sites that run ucj, rb4, etc and get 200k+ hits a day. In this case, you are going to have a ton of scripts ran every second and when you do a listing, there are going to be defunct scripts. If I do the above ps command on one of my 200k sites, there will be 20-30 defunct processes. It is normal, and apache is in the process of closing them. |
heh... out of curiosity, I did this on a server that handles a site with 1.2 million raws and the rb4 trade script and there were 115 defunct processes at that moment.
Did a list immediately afterwards and there were 105 different different defunct processes (as far as I could tell). I think your problem is normal. You either just noticed or just got a bump in traffic or just started running scripts :thumbsup Hope this helps... |
XXXstorage Jeff is correct, if the PID's of the defunct processes are constantly changing and the size of the population is not increasing over time, there's nothing to sweat. There's a minor flaw in some bit of coding (possibly not even yours, but at a lower level- such as a missing/corrupted library file) but it's a low resource drain unless they begin to accumulate.
About how "killing a dead parent to free zombies", what you actually are doing is giving the lost child processes an opportunity to finally exit. As long as the zombie believes its parent is still active, it'll wait for it to deallocate its resources. It will intercept the kill command for the parent (since the parent is no longer there to field it), realize that it is orphaned and proceed to finally exit itself. So, a (temporary) solution to a production server that has a growing population of zombies is a crontab run often enough to keep the zombies from reaching a critical mass, where they would begin to have a noticible negatory impact on the server. If that's needed, I'll look up the code for it for you. But you'd still need someone to go "under the hood" to apply a pernament solution. |
Quote:
[edit] 10 (or maybe 15 or 20) minutes later all (and I mean 100.000% of them, all) these defuncts are different. I guess I do not have to kill them. Can someone please explain to me why they are lasting so long for? These scripts only run for fractions of a second. |
bump again (topic not finished)
|
I have some more info:
You can raise the MaxClients (how many concurrent connections Apache can handle at one time) past the compiled-in value of 256, but it requires customization (which costs me money as my host will not do this for free.) Apparentely if I am hitting the MaxClients at 256, it isn't uncommon on "very busy sites". Not sure how many hits/day that is but it is way more than 100k/day I think. I *think* the reason so many of my scripts are defunct is because when a client connects to my websites, it opens a connection to the parent process (the one "httpd" process running as root), that forks off a "child" process to further handle the requests... and further again, it forks off another "child" process which are my scripts (via SSI). If there are no more requests it will wait for a period of time before dying. *** I did not know this. *** It waits before dying. This is why so many of my scripts are defunct. Please tell me, does this makes sense? A solution for this is to turn off "KeepAlive". Has anyone done this? What type of strain on your CPU does this cause? |
EVERY process can and will be in a zombie state, eventually. the parent process can call wait or wait4() or similar to "reap" the zombies, and usually does nearly instantly.
they do not consume resources beyond a simple entry in the process table (i checked and double checked the 4.4BSD red book on this). no resources, unless you have like 100000 zombies on your system taking up all of the PIDs (that i'd like to see. :) ) fwiw HQ, it's pretty much a "known bug" that apache SSI leaves zombies aroung longer than it probably ought to. zombies are not a problem, however, and are not indicative of a real problem. if you switch from SSI to PHP loading the CGIs or just straight CGIs you should see them disappear from 'top'. regarding keepalive - ALWAYS set that to off. because apache uses a pre-fork model (ie, every client connects to a separate server), there are a limited number of connections the server can handle (MaxClients). with keepalive on, you're basically reserving one of the servers for that one client for X period of time. with keepalive off, that server could be serving other clients during that X period. modern connections and servers do not gain a whole lot from keepalives - the TCP overhead just isn't as big a deal any more. but give it a shot. here's one thing to check while you're tweaking: server-status. add this to your httpd.conf: ExtendedStatus on <Location /HQ-server-status> SetHandler server-status </Location> and restart. then go to http://server.name/HQ-server-status . you'll see a matrix of connections and servers. if you see a bunch of K's and few W's or R's or L's or S's, you may want to turn off keepalives. (this is especially true if you the K count reaches the MaxClient count... at that point you are slowing down the experience for your visitors!) free tip for apache performance tuning, btw. lower MaxRequestsPerChild to 1000, and lower MinSpareServers to 1 and MaxSpareServers to 1. then raise MaxClients to whatever your server can support. this way, if there are memory leaks, your servers won't last long enough for it to matter much (MaxReqsPerChld), and your server will use a pretty standard amount of memory at all times (Min/MaxSpares and MaxClients). stability is good! |
salsbury, thanks for the informative post.
Now I am glad to know with 95% certainty that the defuncts are not caused by my scripts, but instead are caused by my SSI executed CGI scripts that have "httpd" as their parent process which stays "alive" too long due to "KeepAlive" being enabled. Whew. I think that explains it. I am glad that the entries in the process table of these defunct processes do not consume resoures, but I did surpass my MaxClients (which I recently bumped up to the maximum value of 256.) I also ensured that all my defunct process do indeed disappear in 5 or 10 minutes or so (probably quicker). So I have no stagnant processes always taking up space. I would like to shut off KeepAlive, as every hit to my websites runs *two* CGIs. Each of these go defunct after executing quickly and sitting around and waiting for "httpd" to finish up which takes a while (how long by default?) Your post loses me at this point... let me read up more and respond to it later. |
| All times are GMT -7. The time now is 02:13 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123