Finding Causes of Heavy Usage
From DreamHost
The main causes you will find are abuse from specific IPs or inefficient scripts. In the case where it is abuse you should be able to determine the cause if you specifically look at each of the access logs for your domains located in ~/logs/yourdomain.com/http
Enter this command to see the IPs hitting the domain the most:
cat access.log| awk '{print $1}' | sort | uniq -c |sort -n
and this command is even more useful in some cases as it specifically targets the last 10,000 hits:
tail -10000 access.log| awk '{print $1}' | sort | uniq -c |sort -n
Finally, if you have a ton of domains you may want to use this to aggregate them:
for k in `ls --color=none`; do echo "Top visitors by ip for: $k";awk '{print $1}' ~/logs/$k/http/access.log|sort|uniq -c|sort -n|tail;done
If you find IPs that are connecting a lot first check to see who it is, replacing 1.1.1.1 for the IP address:
host 1.1.1.1
and then block it by editing or creating your file named .htaccess (placed in the domain folder like so: /domain/user/domain.com/.htaccess):
<Limit GET HEAD POST> order deny,allow deny from 1.1.1.1 </LIMIT>
Blocking Robots
You might also find it is google or another bot:
yourserver: 04:41 PM# host 66.249.66.167 Name: crawl-66-249-66-167.googlebot.com Address: 66.249.66.167
To take care of that create a file named robots.txt in your domain folder with the following contents:
# go away User-agent: * Disallow: /
Yahoo's crawling bots (either identified by user agent string containing "Yahoo's Slurp" or "inktomisearch.com") do comply to the crawl-delay rule in robots.txt, that limits their fetching activity. For example, to tell Yahoo not to fetch a page more than every 10 seconds, you would add :
# slow down Yahoo User-agent: Slurp Crawl-delay: 10
If you do not see any IP that is clearly the cause and have a lot of content that people might be hotlinking be sure to try blocking that:
Checking Processes
If all that fails to help it's quite likely you have a script that is causing the issue. What you need to do then is check to see what processes are running under your user the most. Enter this from the command line and you'll get the details on what processes are running as the logged in user (you'll likely have to run it a few times):
pgrep -u $USER | awk '{print $1}' | xargs -iQ cat /proc/Q/environ|tr '\000' '\n'
Once you get a few lines of output hit control+c to stop it so you can actually look at what is coming up (note: it might not catch a process right away but if you do it when your account is busy you should get a lot of information). What this is doing is showing you the environment of the processes your user is running.
PATH=/usr/local/bin:/usr/bin:/bin DOCUMENT_ROOT=/home/username/userdomain.org HTTP_ACCEPT=*/* HTTP_CONNECTION=Keep-Alive HTTP_HOST=userdomain.org HTTP_REFERER=http://userdomain.org/weblog HTTP_USER_AGENT=Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) REDIRECT_STATUS=200 REDIRECT_URL=/weblog/dotclear/rss.php REMOTE_ADDR=76.170.16.89 REMOTE_PORT=47739 SCRIPT_FILENAME=/dh/cgi-system/php.cgi SCRIPT_URI=http://userdomain.org/weblog/dotclear/rss.php SCRIPT_URL=/weblog/dotclear/rss.php SERVER_ADDR=208.97.191.135 SERVER_ADMIN=webmaster@userdomain.org SERVER_NAME=userdomain.org SERVER_PORT=80 SERVER_SOFTWARE=Apache/1.3.37 (Unix) mod_throttle/3.1.2 DAV/1.0.3 mod_fastcgi/2.4.2 mod_gzip/1.3.26.1a PHP/4.4.4 mod_ssl/2.8.22 OpenSSL/0.9.7e GATEWAY_INTERFACE=CGI/1.1 SERVER_PROTOCOL=HTTP/1.1 REQUEST_METHOD=GET QUERY_STRING= REQUEST_URI=/weblog/dotclear/rss.php SCRIPT_NAME=/cgi-system/php.cgi PATH_INFO=/weblog/dotclear/rss.php PATH_TRANSLATED=/home/username/userdomain.org/weblog/dotclear/rss.phpcat /proc/12741/environ
Your results won't look exactly like this, but if you have the patience you should be able to get some useful information out of it (in the example above the script running was rss.php in the user's /home/username/userdomain.org/weblog/dotclear folder). Even then you still might need some trial and error to know exactly which processes run are eating up the load and this should help you narrow it down. These are the exact steps taken by DreamHost when investigating a user that is crashing a machine or apache service so there is no need to disable the site or user - you can see why it's not that easy. Fortunately you have the inside track as you'll probably know from your statistics where people are going the most.
If that doesn't get you to the source of the usage here are links to some other useful articles in the wiki:
Finally, if all else fails or you have questions/need guidance, please contact support. While support may not be able to find the specific cause for you, they do what they can - the goal is better service for everyone including yourself!

