GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - defunct processes (https://gfy.com/showthread.php?t=89335)

HQ	11-20-2002 10:04 PM

defunct processes

I have a lot of < defunct > processes on my server. Can someone please tell me everything they know about defunct processes?

I'll start: http://www.linux-tutorial.info/cgi-b...8&9998&115&0&3

Quote:

defunct process:

A process that has made the exit() system call. This process does not use any system resources (including memory) except that it takes up a slot in the process table.

This makes it sound like it is normal for processes to be defunct. BTW, all my defunct processes do call exit() (written in Perl). Is that enough, or is there some coding I have missed? Some websites keep talking about parent processes waiting for the child process to end. I have no parent process (not one that I have coded anyway, as Apache is actually executing my scripts every webpage it serves, right?) so that can't be the problem.

Also, is it possible that I have so many defunct processes because I have so many scripts running on each hit in and click out on my websites, and everything is actually ok?

HQ	11-20-2002 10:09 PM

I have also been told:

Quote:

Defunct processes are hung processes. They can use memory and CPU resources. Top will show you if yours are. Defunct processes are processes that have called the exit() function, but for some reason have not exited cleanly. Large numbers of them may indicate something is wrong on your server.

What is the truth?

Defunct processes need to be explained to me so I can solve this mystery. I have no idea if everything is ok, or if everything is fucked...

HQ	11-20-2002 10:14 PM

My thoughts:

Defunct processes do not use up RAM or CPU but they do take up a spot in the process table. There are only so many slots in this table and if they are all used up by 'zombie' processes, then this could cause the ignoring of new processes. This could be a real problem.

Am I close to right?

I still have no idea if all the defunct processes on my box are 'normal' or caused by a coding bug or some other mistake... :(

HQ	11-20-2002 10:21 PM

Ok, I have also learned that some defunct processes just hang around until Apache is restarted. I just wanted to point out that this is not the case with my defunct processes. There "TIME" in top is always 0:00. (I am assuming the "TIME" goes up higher than this for a defunct process that sticks around forever.)

fiveyes

11-20-2002 10:30 PM

Generally, defunct processes, or zombies, are caused by a parent process not reaping its children. This happens when a child dies without telling its parent, who continues to wait for that notice. And, yes, they are a bad thing- they can slow your system to a complete standstill.

Which process is the parent process of all those zombies (type "ps -e", without the quotes, on a telnet command line)?

fiveyes

11-20-2002 10:38 PM

Make that "ps -el", you want the long listing to get the PID of the parent. That'll be in the column labelled "PPID", reference column "PID" for the match- that'll be the parent. OK?

SpaceAce

11-20-2002 11:05 PM

Quote:

Originally posted by fiveyes
Generally, defunct processes, or zombies, are caused by a parent process not reaping its children. This happens when a child dies without telling its parent, who continues to wait for that notice. And, yes, they are a bad thing- they can slow your system to a complete standstill.

Which process is the parent process of all those zombies (type "ps -e", without the quotes, on a telnet command line)?

This is correct, but you might want a little more info. You mentioned PERL scripts. Chances are your PERL zombies are caused by improper handling of the "fork" command. If you wrote them, check your fork code.

SpaceAce

HQ	11-20-2002 11:21 PM

Quote:

Originally posted by fiveyes
Make that "ps -el", you want the long listing to get the PID of the parent. That'll be in the column labelled "PPID", reference column "PID" for the match- that'll be the parent. OK?

Thanks for the info. Ok, I did it. Sorry for the wait, I was on the phone for an hour trying to figure some other shit out.

The PPID's of the defuncts (when I check them against the PID column) end up being "httpd" processes, as expected as they are all caused by webserver hits (and some via cronjobs). So this is where I get lost with the parent/child thing... Please explain what how this works when there is no parent.

HQ	11-20-2002 11:22 PM

Quote:

Originally posted by SpaceAce
Chances are your PERL zombies are caused by improper handling of the "fork" command. If you wrote them, check your fork code.

There is no fork code. There is no parent or child processes. All my PERL scripts are "stand-alones" or whatever you call them. They are spawned via webserver hits (httpd) via SSIs inside the webpages themselves. This is why I am lost (the fact that they have no parents.)

SpaceAce

11-20-2002 11:47 PM

You HTTP daemon is leaving wayward processes behind? That's pretty odd. Are you running any background processes like Eggdrop, a proxy server, etc?

What version of which server are you using and are you loading any non-standard modules? Did you change anything in the config file or was it all set up by your host?

Maybe some logging processes or security features? There's about a billion possibilities. Can you pinpoint when one of them shows up?

SpaceAce

fiveyes

11-21-2002 12:37 AM

Well, the problem with zombies is that they are resource hogs that give nothing back in return. Interprocess communication is done with shared memory and there's only so much of it set aside for this. A parent allocates this space for the child before forking it and is responsible for deallocating it after the child dies. Whenever the parent exits without telling the child, the child cannot die because this resource remains tied up- hence the "zombie" name. You can't even kill a zombie directly, you have to kill the parent (PPID) of the zombie. That may seem strange, to kill a process that has already exited, but what you actually end up doing is signalling it's children that the parent is dead, so they can go ahead and exit themselves.

If it happens often enough, all the slots in the area become filled and it's impossible to fork a new process. The good news is, it takes a hell of a lot of zombies to do this. The bad news is, sooner or later that's exactly what you'll have if you don't either cure the problem or start flushing them out regularly.

Anyway, here are the most likely causes of this on a web server that I can think of. Solution varies as to cause.
First of all, check the root crontab, check the /etc/inittab, /etc/rc.d/* files,
/etc/inetd.conf to make sure all is in order. Better yet, have someone else check them that knows what they're doing but isn't that close to the specific docs. :)
When you ran "ps -el", the status line (S) for defunct processes was a "Z" (for Zombie). Correct? Do any processes show as a "T" (for sTopped)?
Do you have an oracle database (DBD 0.47 especially!) in use with PHP?
Do any of your perl or cron scripts contain interactive commands? This would be something along the lines of executing the "htpasswd" program, which expects a repeated password.
If there are no interactive commands, are you checking returns CORRECTLY on ALL system calls?
Are you using fast-CGI?
Are you using CGI-wrap?

ED	11-21-2002 07:56 AM

BUMP
:BangBang:

HQ	11-21-2002 09:40 AM

Thanks for the extra input. I do not have time right now to check into this, but within the hour I'll get back to you guys with more details. Thanks.

DarkJedi

11-21-2002 09:44 AM

huh

HQ	11-21-2002 11:20 AM

Quote:

Originally posted by SpaceAce
Are you running any background processes like Eggdrop, a proxy server, etc?

What version of which server are you using and are you loading any non-standard modules? Did you change anything in the config file or was it all set up by your host?

Maybe some logging processes or security features? There's about a billion possibilities. Can you pinpoint when one of them shows up?

I am not running any background processes that are not standard with the server except for my own CGI scripts that I coded myself in PERL. This is why there is a possibility that I am not exiting the PERL scripts properly (even though I am 90% sure that I coded the scripts properly.) These scripts have no parent processes as they are 'spawned' via SSIs inside my webpages. Therefore it is the httpd processes themselves that spawn (and are therefore parents of) my scripts.

My versions are Red Hat Linux 7.2 and Apache version 1.3.22. I did not change any config files myself except for adding in new virtual servers and enabling SSI and CGI.

My only guess is my own coding errors in my scripts as they are the only processes going defunct, but in the same breath they are the only scripts running (my own).

HQ	11-21-2002 11:24 AM

Quote:

Originally posted by fiveyes
Well, the problem with zombies is that they are resource hogs that give nothing back in return. Interprocess communication is done with shared memory and there's only so much of it set aside for this. A parent allocates this space for the child before forking it and is responsible for deallocating it after the child dies. Whenever the parent exits without telling the child, the child cannot die because this resource remains tied up- hence the "zombie" name. You can't even kill a zombie directly, you have to kill the parent (PPID) of the zombie. That may seem strange, to kill a process that has already exited, but what you actually end up doing is signalling it's children that the parent is dead, so they can go ahead and exit themselves.

So if the parent is httpd, then does httpd have to 'die' before my scripts 'die'? Could it be as simple as that? If so, is there any way around this?

HQ	11-21-2002 11:44 AM

Quote:

Originally posted by fiveyes
When you ran "ps -el", the status line (S) for defunct processes was a "Z" (for Zombie). Correct? Do any processes show as a "T" (for sTopped)?

Yes the defuncts are "Z" and I have no "T"s at all.

HQ	11-21-2002 11:48 AM

Quote:

Originally posted by fiveyes
Do you have an oracle database (DBD 0.47 especially!) in use with PHP?
Do any of your perl or cron scripts contain interactive commands? This would be something along the lines of executing the "htpasswd" program, which expects a repeated password.
If there are no interactive commands, are you checking returns CORRECTLY on ALL system calls?
Are you using fast-CGI?
Are you using CGI-wrap?

No orcale DB. The PERL scripts (run via httpd and via cronjobs) do not have interactive commands, in other words they do not wait for input if that is what you mean. No FastCGI and no CGIwarp.

Juge	11-22-2002 07:01 AM

* bump *

HQ	11-25-2002 08:08 AM

.: bump :.

HQ	11-25-2002 08:50 PM

No one has any other input on this? If you can help in any way possible, it would be greatly appreciated.

Jeffery

11-25-2002 09:27 PM

Ok, do this:

ps ax | grep defunct

concentrate on the pids - left side. Which of these two scenarios is the case:

1) The pids are staying the same.
2) The pid list is different from each listing

If the answer is 2... Do you get a ton of hits to your server? If that is the case, this is normal. I host a ton of sites that run ucj, rb4, etc and get 200k+ hits a day.

In this case, you are going to have a ton of scripts ran every second and when you do a listing, there are going to be defunct scripts. If I do the above ps command on one of my 200k sites, there will be 20-30 defunct processes.
It is normal, and apache is in the process of closing them.

Jeffery

11-25-2002 09:37 PM

heh... out of curiosity, I did this on a server that handles a site with 1.2 million raws and the rb4 trade script and there were 115 defunct processes at that moment.

Did a list immediately afterwards and there were 105 different different defunct processes (as far as I could tell).

I think your problem is normal. You either just noticed or just got a bump in traffic or just started running scripts :thumbsup

Hope this helps...

fiveyes

11-25-2002 10:20 PM

XXXstorage Jeff is correct, if the PID's of the defunct processes are constantly changing and the size of the population is not increasing over time, there's nothing to sweat. There's a minor flaw in some bit of coding (possibly not even yours, but at a lower level- such as a missing/corrupted library file) but it's a low resource drain unless they begin to accumulate.

About how "killing a dead parent to free zombies", what you actually are doing is giving the lost child processes an opportunity to finally exit. As long as the zombie believes its parent is still active, it'll wait for it to deallocate its resources. It will intercept the kill command for the parent (since the parent is no longer there to field it), realize that it is orphaned and proceed to finally exit itself.

So, a (temporary) solution to a production server that has a growing population of zombies is a crontab run often enough to keep the zombies from reaching a critical mass, where they would begin to have a noticible negatory impact on the server. If that's needed, I'll look up the code for it for you. But you'd still need someone to go "under the hood" to apply a pernament solution.

HQ	11-26-2002 12:56 PM

Quote:

Originally posted by XXXstorage Jeff
Ok, do this:

ps ax | grep defunct

I have done this and I see a few processes that are staying the same number. So I think I should probably kill those...

[edit]

10 (or maybe 15 or 20) minutes later all (and I mean 100.000% of them, all) these defuncts are different. I guess I do not have to kill them.

Can someone please explain to me why they are lasting so long for? These scripts only run for fractions of a second.

HQ	11-27-2002 12:07 PM

bump again (topic not finished)

HQ	11-29-2002 11:11 AM

I have some more info:

You can raise the MaxClients (how many concurrent connections Apache can handle at one time) past the compiled-in value of 256, but it requires customization (which costs me money as my host will not do this for free.)

Apparentely if I am hitting the MaxClients at 256, it isn't uncommon on "very busy sites". Not sure how many hits/day that is but it is way more than 100k/day I think.

I *think* the reason so many of my scripts are defunct is because when a client connects to my websites, it opens a connection to the parent process (the one "httpd" process running as root), that forks off a "child" process to further handle the requests... and further again, it forks off another "child" process which are my scripts (via SSI). If there are no more requests it will wait for a period of time before dying. *** I did not know this. *** It waits before dying. This is why so many of my scripts are defunct.

Please tell me, does this makes sense?

A solution for this is to turn off "KeepAlive". Has anyone done this? What type of strain on your CPU does this cause?

salsbury

11-29-2002 11:35 AM

EVERY process can and will be in a zombie state, eventually. the parent process can call wait or wait4() or similar to "reap" the zombies, and usually does nearly instantly.

they do not consume resources beyond a simple entry in the process table (i checked and double checked the 4.4BSD red book on this). no resources, unless you have like 100000 zombies on your system taking up all of the PIDs (that i'd like to see. :) )

fwiw HQ, it's pretty much a "known bug" that apache SSI leaves zombies aroung longer than it probably ought to. zombies are not a problem, however, and are not indicative of a real problem.

if you switch from SSI to PHP loading the CGIs or just straight CGIs you should see them disappear from 'top'.

regarding keepalive - ALWAYS set that to off.

because apache uses a pre-fork model (ie, every client connects to a separate server), there are a limited number of connections the server can handle (MaxClients). with keepalive on, you're basically reserving one of the servers for that one client for X period of time. with keepalive off, that server could be serving other clients during that X period. modern connections and servers do not gain a whole lot from keepalives - the TCP overhead just isn't as big a deal any more.

but give it a shot.

here's one thing to check while you're tweaking: server-status. add this to your httpd.conf:

ExtendedStatus on
<Location /HQ-server-status>
SetHandler server-status
</Location>

and restart. then go to http://server.name/HQ-server-status . you'll see a matrix of connections and servers. if you see a bunch of K's and few W's or R's or L's or S's, you may want to turn off keepalives. (this is especially true if you the K count reaches the MaxClient count... at that point you are slowing down the experience for your visitors!)

free tip for apache performance tuning, btw.

lower MaxRequestsPerChild to 1000, and lower MinSpareServers to 1 and MaxSpareServers to 1. then raise MaxClients to whatever your server can support. this way, if there are memory leaks, your servers won't last long enough for it to matter much (MaxReqsPerChld), and your server will use a pretty standard amount of memory at all times (Min/MaxSpares and MaxClients). stability is good!

HQ	11-29-2002 01:43 PM

salsbury, thanks for the informative post.

Now I am glad to know with 95% certainty that the defuncts are not caused by my scripts, but instead are caused by my SSI executed CGI scripts that have "httpd" as their parent process which stays "alive" too long due to "KeepAlive" being enabled. Whew. I think that explains it.

I am glad that the entries in the process table of these defunct processes do not consume resoures, but I did surpass my MaxClients (which I recently bumped up to the maximum value of 256.)

I also ensured that all my defunct process do indeed disappear in 5 or 10 minutes or so (probably quicker). So I have no stagnant processes always taking up space.

I would like to shut off KeepAlive, as every hit to my websites runs *two* CGIs. Each of these go defunct after executing quickly and sitting around and waiting for "httpd" to finish up which takes a while (how long by default?)

Your post loses me at this point... let me read up more and respond to it later.

All times are GMT -7. The time now is 02:13 AM.