Process Accounting

From DreamHost
Jump to: navigation, search

Process Accounting (to find and remove the cause of slowness)

Unix Process Accounting is a tool that literally takes account of every single process on your server, recording how much CPU (processor) and memory (RAM) each process consumes. When summarized, this information can be very useful for tracking down overzealous processes and hogs of resources so that you may restore your server to it's rightful speed and performance!

DreamHost keeps a daily accounting of your processes, which can be found in /var/log/sa/. For example, /var/log/sa/sa.itemized.0 is an itemized report of all the processes run from yesterday, sa.itemized.1 is the itemized report for the day before yesterday, sa.itemized.2 is the itemized report for two days before yesterday. Along those same lines is sa.summary.0, which is a summary of each user's consumption of resources.

Note: most web server users can only access files within their /home/username directory. VPS or dedicated servers can have admin users who can use sudo to access logs in /var/logs.

You may have the daily reports of your processes in your home directory, in the logs/resources/ directory, by enabling CPU reporting for the user. See Where can I find resource reports? for more information.

What does all of this mean? The best way to use this information is to examine the CPU-minutes. A CPU-minute can be considered a unit of work for a server. No, CPU-minutes are not "real time", that is, there are not 60minutesX24hours=1440 CPU-minutes in a day. If a machine is idle, doing nothing at all, that doesn't count towards CPU time. Only when the processor is actually doing something does it count as CPU-minutes.

So knowing how many CPU-minutes per day your users and processes are consuming can help you to find the user or process which is causing problems for the entire machine. The best way to understand all of this is to breeze through the examples below, and then try it out on your own machine.

One final confusing note: Yes, there are CPU-seconds as well: 1 CPU-minute = 60 CPU-seconds

In the examples below: "cp" = CPU-minute "cpu" = CPU-second

 

# Give me a summary of the 10 "biggest" users 
# (Note that the entire machine utilized 599 CPU-minutes for the entire day yesterday) yourserver:
yourserver:$ head /var/log/sa/sa.summary.0 
133573 100.00% 46089720.15re 100.00% 599.89cp 100.00% 0avio 78478595k*sec bob 84064 62.93% 618688.32re 1.34% 349.68cp 58.29% 0avio 35499807k*sec mysql 2158 1.62% 29884017.65re 64.84% 147.47cp 24.58% 0avio 413986887k*sec dhapache 9201 6.89% 14111172.13re 30.62% 87.92cp 14.66% 0avio 17167069k*sec root 35136 26.30% 860049.95re 1.87% 14.14cp 2.36% 0avio 395749k*sec postfix 1267 0.95% 612272.05re 1.33% 0.44cp 0.07% 0avio 21756k*sec mailuser1 271 0.20% 542.82re 0.00% 0.03cp 0.01% 0avio 728k*sec mailuser2 271 0.20% 612.53re 0.00% 0.03cp 0.01% 0avio 673k*sec louanne 271 0.20% 539.83re 0.00% 0.03cp 0.01% 0avio 659k*sec sarahlee 271 0.20% 610.13re 0.00% 0.03cp 0.01% 0avio 655k*sec 

# Give me an analysis of all the processes run by all the users. 
yourserver:$ /dh/misc/bin/sa.analysis.pl /var/log/sa/sa.itemized.0 | head Process CPU seconds user machine count average php.cgi 20980.6200 58.291% 87.419% 84063 0.250 mysqld 8786.3600 24.411% 36.610% 2217 3.963 libhttpd.ep 5290.4900 14.699% 22.044% 8650 0.612 ps 286.3300 0.796% 1.193% 726 0.394 sshd 266.5500 0.741% 1.111% 1459 0.183 analog 83.8400 0.233% 0.349% 204 0.411 mysqld 62.3700 0.173% 0.260% 12 5.197 analyze_log.pl 37.9200 0.105% 0.158% 5 7.584 top 26.7400 0.074% 0.111% 2 13.370 gzip 18.5000 0.051% 0.077% 85 0.218 
# That was the information that you really want to look at. 
# Notice that for this machine, php.cgi was run a whopping 
# 84,000 times in one day (about once a second), consuming 
# about 350 CPU-minutes that day (which is a lot!). But it only 
# consumed on average 0.25 CPU-minutes. It looks pretty efficient, 
# yet even a 15% increase in efficiency would drop the amount of 
# resources consumed by php.cgi to under 300. 
# Give me a summary of all the processes run by the user `bob' yesterday: 

yourserver:$ grep ^bob[[:blank:]] /var/log/sa/sa.itemized.0 > /tmp/asdf
yourserver:$ /dh/misc/bin/sa.analysis.pl /tmp/asdf |head -10 
Process CPU seconds user machine count average php.cgi 
20980.6200 99.999% 87.419% 84063 0.250 dothis.pl 0.1300 0.001% 0.001% 1 0.130 


# Give me some of the raw data for the fun of it yourserver:
yourserver:$ tail -10 /var/log/sa/sa.itemized.0 
root 0.01 cpu 311k mem 0 io chmod 
dhapache 0.04 cpu 3370k mem 0 io libhttpd.ep * 
root 0.00 cpu 314k mem 0 io date 
root 0.02 cpu 514k mem 0 io savelog 
bob 0.23 cpu 1723k mem 0 io php.cgi 
bob 0.11 cpu 1723k mem 0 io php.cgi 
dhapache 0.01 cpu 3370k mem 0 io libhttpd.ep * 
bob 0.03 cpu 1719k mem 0 io php.cgi 
root 0.01 cpu 310k mem 0 io uname 
dhapache 0.02 cpu 3370k mem 0 io libhttpd.ep * 
root 0.00 cpu 310k mem 0 io cut 

If you have admin access (create an admin user in the VPS > Manage Admin Users section of the Dreamhost panel) you can also check auxiliary processes. These can hang and chew up resources.


yourserver:$ sudo ps aux

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1   6124   676 ?        Ss   Nov16   0:21 init [2]  
dhapache  1893  0.0  2.7 306280 11328 ?        S    09:11   0:00 /dh/apache2/apa
dhapache  2666  0.0  2.7 306132 11236 ?        S    09:12   0:00 /dh/apache2/apa
dhapache  2667  0.0  0.0      0     0 ?        Z    09:12   0:00 [apa] <defunct>
root      2712  0.0  0.1  18548   424 ?        Ss   Nov18   0:02 /usr/sbin/cron
root      2714  0.0  0.4  20184  2044 pts/3    S+   09:12   0:00 links http://as
root      2786  0.0  0.2  14728   988 pts/1    R+   09:12   0:00 ps aux
root      3068  1.6  6.3 305972 25916 ?        Ss   05:34   3:35 /dh/apache2/apa
root      3540  0.0  0.1  36840   736 ?        Ss   Nov18   0:06 /usr/lib/postfi
postfix   3606  0.0  0.2  38944   940 ?        S    Nov18   0:06 qmgr -l -t fifo
nobody    3675  0.0  0.1 106580   704 ?        Ss   Nov18   0:37 proftpd: (accep
root      9612  0.0  0.4  66076  1816 ?        Ss   08:48   0:00 sshd: root@pts/
root      9617  0.0  0.3 101556  1516 pts/3    Ss   08:48   0:00 -bash
postfix  11847  0.0  0.3  38896  1356 ?        S    08:51   0:00 pickup -l -t fi
root     12502  0.0  0.4  65928  1672 ?        Ss   08:52   0:00 sshd: pammy 
pammy	 12561  0.0  0.3  65928  1276 ?        S    08:52   0:00 sshd: pammy@
pammy	 12563  0.0  0.3  17996  1580 pts/1    Ss   08:52   0:00 -bash
dhapache 24934  0.0  2.4 293256  9892 ?        S    08:32   0:00 /dh/apache2/apa
dhapache 24943  0.0  2.4 305972 10144 ?        S    08:32   0:00 /dh/apache2/apa
bunny	 24984  0.0  4.9 134248 20460 ?        S    05:21   0:01 php5.cgi
bunny	 25025  0.0  4.9 134244 20460 ?        S    05:21   0:02 php5.cgi
bunny	 25092  0.0  6.7 141380 27696 ?        S    05:21   0:01 php5.cgi
bunny 	 25093  0.0  3.1 126432 12752 ?        S    05:21   0:00 php5.cgi
bunny	 25107  0.0  6.6 141024 27284 ?        S    05:21   0:00 php5.cgi
bunny 	 25170  0.0  6.7 141336 27656 ?        S    05:21   0:03 php5.cgi
bunny 	 25171  0.0  6.7 141380 27696 ?        S    05:21   0:02 php5.cgi
bunny 	 25212  0.0  6.6 141068 27416 ?        S    05:21   0:02 php5.cgi
bunny 	 25226  0.0  6.6 141064 27404 ?        S    05:21   0:00 php5.cgi
bunny 	 25229  0.0  6.9 141880 28276 ?        S    05:21   0:03 php5.cgi
root     28627  0.0  0.1  48864   716 ?        Ss   05:26   0:02 /usr/sbin/sshd
dhapache 30376  0.0  2.7 306280 11300 ?        S    09:10   0:00 /dh/apache2/apa
bunny 	 31309  2.0  9.0 147332 37028 ?        S    08:40   0:40 php5.cgi
bunny 	 31316  2.4  9.3 148712 38104 ?        S    08:40   0:48 php5.cgi
daemon   31529  0.0  0.0  16356   244 ?        Ss   Nov18   0:00 /usr/sbin/atd
daemon   31670  0.0  0.0  16356   208 ?        Ss   Nov18   0:00 /usr/sbin/atd

# Some of the CGI scripts in this example are using significant memory (%MEM column). 
yourserver:$ sudo pkill php5.cgi processes
# will kill those processes.

See Also