Advanced Troubleshooting Techniques
Overview
In a shared hosting environment there are a large number of factors that can affect your site's performance. There are generally a handful of major problem types that can affect performance.
- Database-related issues causing your site to hang while loading.
- Misconfigurations keeping your site from loading.
- Software compromises (hacked sites).
- Memory limits causing your processes to be killed.
Any of the above can contribute to another major cause of loss of performance -- server load issues. They can also interact in such a way that one of the above issues can cause others to occur. For instance, if you have a database-related issue that's causing your PHP processes to hang, then they'll start building up. That will cause your user to hit its memory limit which will just compound the problem. Hacked sites oftentimes startup excessive amounts of processes to participate in a DDoS on some third party site that will use up all your available user memory (and in some cases can even cause network saturation on the host machine). This article is to help you diagnose problems like these.
Feeling Out the Problem
Okay, your site isn't loading. There are many ways in which your site can not load.
- Immediate 500 Internal Server Error.
- Site spins trying to load for a while and then 500 Internal Server Error.
- Site spins trying to load forever.
- Site loads immediately, but only a blank page is displayed.
- Site loads immediately, but displays a database connection error.
- Site loads immediately, but displays a 403 Forbidden error.
- Site loads, but shows a 404 error rather than what you expected.
Those are some of the most common cases. Let's take a look at each in turn.
Immediate 500 Internal Server Error
There are two things this could potentially be. If this happens instantaneously, then most likely the cause is something to do with your site's .htaccess file. Some potential things to look for:
- Syntax errors in the .htaccess file
- Custom PHP setup that isn't working
To see if this is the cause, try renaming the .htaccess file in your site's home directory to something like ".htaccess.disabled". If the problem was in your .htaccess file, that would immediately solve the problem. Keep in mind that .htaccess rules apply to all subdirectories -- so it's possible for a .htaccess file outside of your site's web directory to affect your site. Make sure you also look higher up in the directory structure for .htaccess files that might affect things and try renaming any you find to see if it helps.
If that solves the problem, great! If not, then another potential thing that might be going on is you may be having processes killed due to exceeding your user's memory limit. One quick way to see if this might be affecting you is to simply check to see what processes you have running as your user. To do that, log into your server using SSH like this:
~$ ssh youruser@server.dreamhost.com Password: youruser@server:~$
Once you're in, run the top -c command like this:
youruser@server:~$ top -c top - 14:37:35 up 10 days, 17:35, 3 users, load average: 0.83, 0.89, 1.11 Tasks: 16 total, 1 running, 15 sleeping, 0 stopped, 0 zombie Cpu(s): 34.7%us, 4.8%sy, 1.7%ni, 56.5%id, 0.9%wa, 0.2%hi, 1.1%si, 0.0%st Mem: 32966092k total, 32546460k used, 419632k free, 6369232k buffers Swap: 8000328k total, 228972k used, 7771356k free, 12650516k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8384 youruser 20 0 66984 11m 6852 S 1 0.0 0:03.62 php5.cgi 8385 youruser 20 0 66044 10m 6700 S 0 0.0 0:00.24 php5.cgi 10895 youruser 20 0 65940 10m 6848 S 0 0.0 0:00.92 php5.cgi 10917 youruser 20 0 65980 10m 6848 S 0 0.0 0:00.79 php5.cgi 7542 youruser 20 0 65956 10m 6860 S 0 0.0 0:00.51 php5.cgi 7818 youruser 20 0 65980 10m 6860 S 0 0.0 0:00.35 php5.cgi 7828 youruser 20 0 65988 10m 6860 S 0 0.0 0:00.33 php5.cgi 7917 youruser 20 0 66016 10m 6860 S 0 0.0 0:00.43 php5.cgi 8152 youruser 20 0 65976 10m 6856 S 0 0.0 0:04.21 php5.cgi 8380 youruser 20 0 65932 10m 6848 S 0 0.0 0:04.03 php5.cgi 8386 youruser 20 0 66020 10m 6860 S 0 0.0 0:00.32 php5.cgi 10896 youruser 20 0 65908 10m 6848 S 0 0.0 0:00.66 php5.cgi 10919 youruser 20 0 65948 10m 6848 S 0 0.0 0:00.24 php5.cgi
If it looks something like the above, then you're very likely running into this problem! Generally, if you're running more than 10 PHP processes at once and they hold pretty steady like that, then there are good odds this is a problem. Details on what to do will follow later.
Site spins trying to load for a while and then 500 Internal Server Error
This can be caused by a few different things. One cause can sometimes be running into memory limit issues as described in the Immediate 500 Internal Server Error section above. The most common cause is PHP timing out. If this only happens on a subset of pages (in particular admin pages for the software you're using), then it's very likely that this could be the cause. By default, the PHP timeout is 30 seconds. You can find out if this is the cause by creating a custom php.ini for your site and adjusting the max_execution_time setting to 2-3 times what it is now. If this happens to all of your pages, then it can still potentially be a PHP timeout. You can check your site error logs in the logs/yourdomain.com/http/error.log file inside your user's home directory to see if there are any helpful error messages. If all you see is a "Premature end of headers" error, then that is generally not too helpful as it simply means the script exited before completing.
Site spins trying to load forever
This is perhaps the most generic thing that can happen with your site. Most often this means that something is causing your PHP processes to hang. If you check the top -c command on the server you might notice <defunct> showing up next to some of those processes as well. This can be caused by a large number of things. With WordPress, this often has to do with database tables your site is using having overhead. That shouldn't cause problems, but for some reason WordPress can get itself into a bad state if there is any overhead (particularly in the wp_options table) and it will oftentimes exhibit this symptom in those cases. Other major causes of this are software misconfigurations or third party addons to the software you're using that have a compatibility issue or poor coding.
Site loads immediately, but only a blank page is displayed
This issue is most often related to either a theme that's being used having problems or a caching addon behaving improperly. Depending upon what exactly is going on this one can be a little tricky to solve and will likely require some fiddling to get things working properly again.
Site loads immediately, but displays a database connection error
This can happen either because the database server is unavailable or because the database connection information is incorrect. Under some conditions servers may have trouble contacting MySQL servers due to networking issues (fairly rare). In many cases this is simply because the database login information was changed without updating the connection information or the MySQL hostname isn't working properly (e.g., domain being used expired, hostname was removed from the webpanel, etc).
The first thing to do in this case is log into your server via SSH and try connecting to the MySQL server using the connection information you're using in your site's configuration file.
~$ ssh youruser@server.dreamhost.com Password:
Once you're logged in, change to the directory of the site you're having problems with:
youruser@server:~$ cd blog.somesite.com
Once you're there, get the database connection information you need from the configuration file your site is using (this will vary from software to software -- in this case we're looking at a WordPress site):
youruser@server:~/blog.somesite.com$ cat wp-config.php | grep "DB_"
define('DB_NAME', 'your_dbname');
define('DB_USER', 'your_dbuser');
define('DB_PASSWORD', 'your_dbpass');
define('DB_HOST', 'mysql.yourhostname.com');
define('DB_CHARSET', 'utf8');
define('DB_COLLATE', );
Now try connecting to the database directly using that information:
youruser@server:~/blog.somesite.com$ mysql -u your_dbuser -p -h mysql.yourhostname.com your_dbname Enter password:
If you get output like this:
Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 343144 to server version: 5.0.67-userstats-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>
Then that means the connection information is good and something else is going on that's keeping things from working properly. At that point, you should contact support and mention what you did to check the connection information.
If you get output like this:
ERROR 1045 (28000): Access denied for user 'your_dbuser'@'randomdomain.com' (using password: YES)
Then double-check to make sure all your connection credentials are correct. Another way to make sure your hostname is working properly is to try browsing to it in your web browser (http://mysql.yourhostname.com). If it's configured properly that should yield a password prompt for phpMyAdmin. If that doesn't load or loads something other than phpMyAdmin, then there's likely something wrong with the hostname.
These troubleshooting steps will be helpful in any web application though the exact means of obtaining the database information will vary.
Site loads immediately, but displays a 403 Forbidden error
403 Forbidden errors are displayed when a deny rule is set for an IP in a site's .htaccess file or when file permissions keep the web server from serving up a page. In most cases this is file permission-related. To check permissions for your site, login via SSH like this:
~$ ssh youruser@server.dreamhost.com Password:
Then get a directory listing:
youruser@server:~$ ls -la drwxr-x--x 16 youruser pg123456 4096 2009-12-10 04:25 ./ ...
The first line should look like the above. Notice the permission string that reads "drwxr-x--x". The first letter stands for "directory", then there are three sets of three permissions. The first set are the owner permissions, which are set to read/write/execute. The second set are the group permissions which are set to read/execute. The third set are "other" permissions (or what all other users have), which is set to execute only. If you have Enhanced Security enabled for your user it would look like this instead:
youruser@server:~$ ls -la drwxr-x--- 16 youruser adm 4096 2009-12-10 04:25 ./ ...
The above are correct permission settings. If instead they look like this:
drw-r----- 16 youruser pg123456 4096 2009-12-10 04:25 ./
or this:
d--------- 16 youruser pg123456 4096 2009-12-10 04:25 ./
Then that means your user has been disabled and you should contact DreamHost. If this is the case you'll likely see errors like this when attempting to login and won't be able to get the directory listing as described above:
Could not chdir to home directory /home/youruser: Permission denied -bash: /home/youruser/.bash_profile: Permission denied
If you get this, first wait for approximately 5-8 minutes to see if it gets fixed automatically. If it doesn't, write in to support asking for help.
If permissions look fine, but you're still getting a 403, then try renaming the .htaccess file for your affected domain from ".htaccess" to ".htaccess.disabled" like this:
mv .htaccess .htaccess.disabled
Then try loading up your site. If the 403 is gone, then open up your .htaccess file and look for lines starting with "deny". If you find any, comment them out by putting a "#" before the line and saving the file. You can re-enable the htaccess you disabled above like this:
mv .htaccess.disabled .htaccess
Site loads, but shows a 404 error rather than what you expected
This happens most often with sites that use software like WordPress that use .htaccess rules for their permalinks/pretty URLs. If those rules are removed or changed somehow then a 404 will appear rather than the content you expect. The easiest way to fix this is to download a fresh copy of the software you're using from its website (e.g., http://www.joomla.org/download.html) and then copy the contents of the default .htaccess file and paste it into your existing one (keep in mind that ".htaccess" files are invisible files, so you might need to enable viewing of invisible files on your operating system to find the file if it's there). Not all software comes with a .htaccess file by default. For instance, WordPress generates one when you change your permalinks settings, so keep that in mind as well. If putting in the default htaccess rules doesn't resolve the 404 issue, write in to support and ask for additional help.
When All Sites Are Affected
If the problem isn't related to server load issues, but all your sites are affected, this is almost always because one of your sites is causing problems affecting all the others. In cases like this, it can be hard to know which site is causing the trouble. Here are some tips on how to proceed in those cases.
Checking for Active Processes
This is the first step. Many times you can tell which site is causing trouble by simply checking your active processes. Log into your server via SSH and take a look at which processes are running. Let's say you see something like this:
youruser@server.dreamhost.com:~$ top -c top - 14:37:35 up 10 days, 17:35, 3 users, load average: 0.83, 0.89, 1.11 Tasks: 16 total, 1 running, 15 sleeping, 0 stopped, 0 zombie Cpu(s): 34.7%us, 4.8%sy, 1.7%ni, 56.5%id, 0.9%wa, 0.2%hi, 1.1%si, 0.0%st Mem: 32966092k total, 32546460k used, 419632k free, 6369232k buffers Swap: 8000328k total, 228972k used, 7771356k free, 12650516k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8384 youruser 20 0 66984 11m 6852 S 1 0.0 0:03.62 php5.cgi 8385 youruser 20 0 66044 10m 6700 S 0 0.0 0:00.24 php5.cgi 10895 youruser 20 0 65940 10m 6848 S 0 0.0 0:00.92 php5.cgi 10917 youruser 20 0 65980 10m 6848 S 0 0.0 0:00.79 php5.cgi 7542 youruser 20 0 65956 10m 6860 S 0 0.0 0:00.51 php5.cgi 7818 youruser 20 0 65980 10m 6860 S 0 0.0 0:00.35 php5.cgi 7828 youruser 20 0 65988 10m 6860 S 0 0.0 0:00.33 php5.cgi 7917 youruser 20 0 66016 10m 6860 S 0 0.0 0:00.43 php5.cgi 8152 youruser 20 0 65976 10m 6856 S 0 0.0 0:04.21 php5.cgi 8380 youruser 20 0 65932 10m 6848 S 0 0.0 0:04.03 php5.cgi 8386 youruser 20 0 66020 10m 6860 S 0 0.0 0:00.32 php5.cgi 10896 youruser 20 0 65908 10m 6848 S 0 0.0 0:00.66 php5.cgi 10919 youruser 20 0 65948 10m 6848 S 0 0.0 0:00.24 php5.cgi
In the above case, you can see that there are a lot of php5.cgi processes running under your user, but if you have more than one site that's less than helpful. To find out which sites those processes are serving, run this command:
youruser@server.dreamhost.com:~$ lsof -u youruser | grep php | grep home
lsof: WARNING: can't stat() sysfs file system /mnt/root_base/sys
Output information may be incomplete.
lsof: WARNING: can't stat() proc file system /mnt/root_base/proc
Output information may be incomplete.
lsof: WARNING: can't stat() tmpfs file system /mnt/root_base/dev
Output information may be incomplete.
lsof: WARNING: can't stat() nfs file system /mnt/root_base/dev/.static/dev
Output information may be incomplete.
lsof: WARNING: can't stat() aufs file system /dev/.static/dev
Output information may be incomplete.
php5.cgi 15082 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 15317 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16812 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16938 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16939 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16942 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16943 youruser cwd DIR 8,17 4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 17050 youruser cwd DIR 8,17 4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 17199 youruser cwd DIR 8,17 4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 18713 youruser cwd DIR 8,17 4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 18717 youruser cwd DIR 8,17 4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 20267 youruser cwd DIR 8,17 4096 210748300 /home/youruser/somesite.com/blog
You may or may not get the warning messages above. If you do, you can just ignore them. The information you want is below that. It displays all the open files for your php5.cgi processes, which conveniently displays the directory of the site they're serving. You might also see some other sites interspersed there, but usually you'll find that it's mostly one site. For the case of this example it's a WordPress blog.
Finding Your Busiest Sites
Sometimes the above method doesn't work well, which means you might need to investigate your busiest sites since those will likely be the ones causing the most trouble. The easiest way to find out which sites are your busiest is by checking the access.log file for each site -- the larger it is, the more traffic it's getting (they're rotated daily). To do this, log into your account via SSH and run this command from your user's home directory:
youruser@server.dreamhost.com:~$ ls -laSh logs/*/http/access.log | grep "[KGM] " | awk '{split($8,d,"[/]"); print $5 "\t" d[2]}'
102M somesite.com
41M example.com
24M subdomain.example.com
83K acme-example.org
38K somewhere.info
10K nowhere.com
8.9K test.somesite.com
1.2K test.somewhere.info
In this case, you can see that somesite.com has the largest log by far, followed by example.com and subdomain.example.com. Those are the domains you'll want to focus on looking at.
Investigating the Site
Checking the Version of Your Software
At this point, you've identified the site in question. The first thing you always want to do is check to make sure it's running the most recent version of whatever software it's running. In WordPress' case, you would do that by running this command inside the directory WordPress is installed in:
youruser@server.dreamhost.com:~/somesite.com/blog$ cat wp-includes/version.php <?php /** * This holds the version number in a separate file so we can bump it without cluttering the SVN */ /** * The WordPress version string * * @global string $wp_version */ $wp_version = '2.6.1'; /** * Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema. * * @global int $wp_db_version */ $wp_db_version = 8204; ?>
Checking for Database Table Overhead
Many web applications seem to have trouble if your database tables develop overhead. Database table overhead shouldn't cause problems, but its been demonstrated that it does in many cases for whatever reason. In fact, with WordPress 2.9 they introduced a new option to make your site automatically check for this. To manually check, browse to the MySQL hostname you're using in your site to get to the phpMyAdmin interface. Select the database your site is using from the dropdown menu at the top left of the page and you should see something like this:
Notice the column on the far right titled "overhead". Rows with values are tables that have overhead. Just click the "Check tables having overhead" link at the bottom (this will only show up if some tables actually have overhead), then select "Optimize table" from the dropdown to the right of that link.
Checking Your Addons
Next you'll want to make sure you have some kind of caching addon installed and from there you'll want to check
Taking Action
Disabling an Old Potentially Hacked Site
If the version of WordPress was really old, it has likely been compromised (hacked). As a stop gap to fix this before upgrading the site to permanently fix it, rename the web directory to disable it:
youruser@server.dreamhost.com:~$ mv somesite.com somesite.com_disabled
Then kill all open PHP processes to clear out the hung processes:
youruser@server.dreamhost.com:~$ pkill -u youruser -f php
You might need to run that a few times to make sure everything gets cleared out. Check top -c to make sure they got cleared out:
youruser@server.dreamhost.com:~$ top -c top - 14:37:35 up 10 days, 17:35, 3 users, load average: 0.83, 0.89, 1.11 Tasks: 16 total, 1 running, 15 sleeping, 0 stopped, 0 zombie Cpu(s): 34.7%us, 4.8%sy, 1.7%ni, 56.5%id, 0.9%wa, 0.2%hi, 1.1%si, 0.0%st Mem: 32966092k total, 32546460k used, 419632k free, 6369232k buffers Swap: 8000328k total, 228972k used, 7771356k free, 12650516k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Now check to see if your sites are loading properly and monitor top to make sure PHP processes aren't building back up again.
