Finding Causes of Heavy Usage

From DreamHost

Jump to: navigation, search

The main causes you will find are abuse from specific IPs or inefficient scripts. In the case where it is abuse you should be able to determine the cause if you specifically look at each of the access logs for your domains located in ~/logs/yourdomain.com/http

Enter this command to see the IPs hitting the domain the most:

cat access.log| awk '{print $1}' | sort | uniq -c |sort -n

and this command is even more useful in some cases as it specifically targets the last 10,000 hits:

tail -10000 access.log| awk '{print $1}' | sort | uniq -c |sort -n

Finally, if you have a ton of domains you may want to use this to aggregate them:

for k in `ls --color=none`; do echo "Top visitors by ip for: $k";awk '{print $1}' ~/logs/$k/http/access.log|sort|uniq -c|sort -n|tail;done

This command is great if you want to see what is being called the most (that can often show you that a specific script is being abused if it's being called way more times than anything else in the site):

awk '{print $7}' access.log|cut -d? -f1|sort|uniq -c|sort -nk1|tail -n10

If you find IPs that are connecting a lot first check to see who it is, replacing 1.1.1.1 for the IP address:

host 1.1.1.1

and then block it by editing or creating your file named .htaccess (placed in the domain folder like so: /domain/user/domain.com/.htaccess):

<Limit GET HEAD POST>
 order deny,allow
 deny from 1.1.1.1
</LIMIT>


Blocking Robots

You might also find it is google or another bot:

   yourserver: 04:41 PM# host 66.249.66.167
   Name: crawl-66-249-66-167.googlebot.com
   Address: 66.249.66.167

To take care of that create a file named robots.txt in your domain folder with the following contents:

   # go away
   User-agent: *
   Disallow: /

Yahoo's crawling bots (either identified by user agent string containing "Yahoo's Slurp" or "inktomisearch.com") do comply to the crawl-delay rule in robots.txt, that limits their fetching activity. For example, to tell Yahoo not to fetch a page more than every 10 seconds, you would add :

   # slow down Yahoo
   User-agent: Slurp
   Crawl-delay: 10

If you do not see any IP that is clearly the cause and have a lot of content that people might be hotlinking be sure to try blocking that:

Preventing hotlinking

Checking Processes

If all that fails to help it's quite likely you have a script that is causing the issue. What you need to do then is check to see what processes are running under your user the most. Enter this from the command line and you'll get the details on what processes are running as the logged in user (you'll likely have to run it a few times):

   pgrep -u $USER | awk '{print $1}' | xargs -iQ cat /proc/Q/environ|tr '\000' '\n'

Once you get a few lines of output hit control+c to stop it so you can actually look at what is coming up (note: it might not catch a process right away but if you do it when your account is busy you should get a lot of information). What this is doing is showing you the environment of the processes your user is running.

PATH=/usr/local/bin:/usr/bin:/bin
DOCUMENT_ROOT=/home/username/userdomain.org
HTTP_ACCEPT=*/*
HTTP_CONNECTION=Keep-Alive
HTTP_HOST=userdomain.org
HTTP_REFERER=http://userdomain.org/weblog
HTTP_USER_AGENT=Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
REDIRECT_STATUS=200
REDIRECT_URL=/weblog/dotclear/rss.php
REMOTE_ADDR=76.170.16.89
REMOTE_PORT=47739
SCRIPT_FILENAME=/dh/cgi-system/php.cgi
SCRIPT_URI=http://userdomain.org/weblog/dotclear/rss.php
SCRIPT_URL=/weblog/dotclear/rss.php
SERVER_ADDR=208.97.191.135
SERVER_ADMIN=webmaster@userdomain.org
SERVER_NAME=userdomain.org
SERVER_PORT=80
SERVER_SOFTWARE=Apache/1.3.37 (Unix) mod_throttle/3.1.2 DAV/1.0.3 mod_fastcgi/2.4.2 mod_gzip/1.3.26.1a PHP/4.4.4 mod_ssl/2.8.22 OpenSSL/0.9.7e
GATEWAY_INTERFACE=CGI/1.1
SERVER_PROTOCOL=HTTP/1.1
REQUEST_METHOD=GET
QUERY_STRING=
REQUEST_URI=/weblog/dotclear/rss.php
SCRIPT_NAME=/cgi-system/php.cgi
PATH_INFO=/weblog/dotclear/rss.php
PATH_TRANSLATED=/home/username/userdomain.org/weblog/dotclear/rss.phpcat /proc/12741/environ

Your results won't look exactly like this, but if you have the patience you should be able to get some useful information out of it (in the example above the script running was rss.php in the user's /home/username/userdomain.org/weblog/dotclear folder). Even then you still might need some trial and error to know exactly which processes run are eating up the load and this should help you narrow it down. These are the exact steps taken by DreamHost when investigating a user that is crashing a machine or apache service so there is no need to disable the site or user - you can see why it's not that easy. Fortunately you have the inside track as you'll probably know from your statistics where people are going the most.

If that doesn't get you to the source of the usage here are links to some other useful articles in the wiki:

Finally, if all else fails or you have questions/need guidance, please contact support. While support may not be able to find the specific cause for you, they do what they can - the goal is better service for everyone including yourself!

Personal tools