Advanced Troubleshooting Techniques

From DreamHost
Revision as of 02:39, 10 March 2012 by Jwwicks (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Overview

In a shared hosting environment there are a large number of factors that can affect your site's performance. There are generally a handful of major problem types that can affect performance.

  • Database-related issues causing your site to hang while loading.
  • Misconfigurations keeping your site from loading.
  • Software compromises (hacked sites).
  • Memory limits causing your processes to be killed.

Any of the above can contribute to another major cause of loss of performance -- server load issues. They can also interact in such a way that one of the above issues can cause others to occur. For instance, if you have a database-related issue that's causing your PHP processes to hang, then they'll start building up. That will cause your user to hit its memory limit which will just compound the problem. Hacked sites oftentimes startup excessive amounts of processes to participate in a DDoS on some third party site that will use up all your available user memory (and in some cases can even cause network saturation on the host machine). This article is to help you diagnose problems like these.

Feeling Out the Problem

Okay, your site isn't loading. There are many ways in which your site can not load.

  • Immediate 500 Internal Server Error.
  • Site spins trying to load for a while and then 500 Internal Server Error.
  • Site spins trying to load forever.
  • Site loads immediately, but only a blank page is displayed.
  • Site loads immediately, but displays a database connection error.
  • Site loads immediately, but displays a 403 Forbidden error.
  • Site loads, but shows a 404 error rather than what you expected.

Those are some of the most common cases. Let's take a look at each in turn.

Immediate 500 Internal Server Error

There are two things this could potentially be. If this happens instantaneously, then most likely the cause is something to do with your site's .htaccess file. Some potential things to look for:

  • Syntax errors in the .htaccess file
  • Custom PHP setup that isn't working

To see if this is the cause, try renaming the .htaccess file in your site's home directory to something like ".htaccess.disabled". If the problem was in your .htaccess file, that would immediately solve the problem. Keep in mind that .htaccess rules apply to all subdirectories -- so it's possible for a .htaccess file outside of your site's web directory to affect your site. Make sure you also look higher up in the directory structure for .htaccess files that might affect things and try renaming any you find to see if it helps.

If that solves the problem, great! If not, then another potential thing that might be going on is you may be having processes killed due to exceeding your user's memory limit. One quick way to see if this might be affecting you is to simply check to see what processes you have running as your user. To do that, log into your server using SSH like this:

~$ ssh youruser@server.dreamhost.com
Password:
youruser@server:~$

Once you're in, run the top -c command like this:

youruser@server:~$ top -c
top - 14:37:35 up 10 days, 17:35,  3 users,  load average: 0.83, 0.89, 1.11
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.7%us,  4.8%sy,  1.7%ni, 56.5%id,  0.9%wa,  0.2%hi,  1.1%si,  0.0%st
Mem:  32966092k total, 32546460k used,   419632k free,  6369232k buffers
Swap:  8000328k total,   228972k used,  7771356k free, 12650516k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                  
 8384 youruser  20   0 66984  11m 6852 S    1  0.0   0:03.62 php5.cgi                                                                                                                                  
 8385 youruser  20   0 66044  10m 6700 S    0  0.0   0:00.24 php5.cgi                                                                                                                                  
10895 youruser  20   0 65940  10m 6848 S    0  0.0   0:00.92 php5.cgi                                                                                                                                  
10917 youruser  20   0 65980  10m 6848 S    0  0.0   0:00.79 php5.cgi                                                                                                                                  
 7542 youruser  20   0 65956  10m 6860 S    0  0.0   0:00.51 php5.cgi                                                                                                                                  
 7818 youruser  20   0 65980  10m 6860 S    0  0.0   0:00.35 php5.cgi                                                                                                                                  
 7828 youruser  20   0 65988  10m 6860 S    0  0.0   0:00.33 php5.cgi                                                                                                                                  
 7917 youruser  20   0 66016  10m 6860 S    0  0.0   0:00.43 php5.cgi                                                                                                                                  
 8152 youruser  20   0 65976  10m 6856 S    0  0.0   0:04.21 php5.cgi                                                                                                                                  
 8380 youruser  20   0 65932  10m 6848 S    0  0.0   0:04.03 php5.cgi                                                                                                                                  
 8386 youruser  20   0 66020  10m 6860 S    0  0.0   0:00.32 php5.cgi                                                                                                                                  
10896 youruser  20   0 65908  10m 6848 S    0  0.0   0:00.66 php5.cgi                                                                                                                                  
10919 youruser  20   0 65948  10m 6848 S    0  0.0   0:00.24 php5.cgi

If it looks something like the above, then you're very likely running into this problem! Generally, if you're running more than 10 PHP processes at once and they hold pretty steady like that, then there are good odds this is a problem. Details on what to do will follow later.

Site spins trying to load for a while and then 500 Internal Server Error

This can be caused by a few different things. One cause can sometimes be running into memory limit issues as described in the Immediate 500 Internal Server Error section above. The most common cause is PHP timing out. If this only happens on a subset of pages (in particular admin pages for the software you're using), then it's very likely that this could be the cause. By default, the PHP timeout is 30 seconds. You can find out if this is the cause by creating a custom php.ini for your site and adjusting the max_execution_time setting to 2-3 times what it is now. If this happens to all of your pages, then it can still potentially be a PHP timeout. You can check your site error logs in the logs/yourdomain.com/http/error.log file inside your user's home directory to see if there are any helpful error messages. If all you see is a "Premature end of headers" error, then that is generally not too helpful as it simply means the script exited before completing.

Site spins trying to load forever

This is perhaps the most generic thing that can happen with your site. Most often this means that something is causing your PHP processes to hang. If you check the top -c command on the server you might notice <defunct> showing up next to some of those processes as well. This can be caused by a large number of things. With WordPress, this often has to do with database tables your site is using having overhead. That shouldn't cause problems, but for some reason WordPress can get itself into a bad state if there is any overhead (particularly in the wp_options table) and it will oftentimes exhibit this symptom in those cases. Other major causes of this are software misconfigurations or third party addons to the software you're using that have a compatibility issue or poor coding.

Site loads immediately, but only a blank page is displayed

This issue is most often related to either a theme that's being used having problems or a caching addon behaving improperly. Depending upon what exactly is going on this one can be a little tricky to solve and will likely require some fiddling to get things working properly again.

Site loads immediately, but displays a database connection error

This can happen either because the database server is unavailable or because the database connection information is incorrect. Under some conditions servers may have trouble contacting MySQL servers due to networking issues (fairly rare). In many cases this is simply because the database login information was changed without updating the connection information or the MySQL hostname isn't working properly (e.g., domain being used expired, hostname was removed from the webpanel, etc).

The first thing to do in this case is log into your server via SSH and try connecting to the MySQL server using the connection information you're using in your site's configuration file.

~$ ssh youruser@server.dreamhost.com
Password:

Once you're logged in, change to the directory of the site you're having problems with:

youruser@server:~$ cd blog.somesite.com

Once you're there, get the database connection information you need from the configuration file your site is using (this will vary from software to software -- in this case we're looking at a WordPress site):

youruser@server:~/blog.somesite.com$ cat wp-config.php | grep "DB_"
define('DB_NAME', 'your_dbname');
define('DB_USER', 'your_dbuser');
define('DB_PASSWORD', 'your_dbpass');
define('DB_HOST', 'mysql.yourhostname.com');
define('DB_CHARSET', 'utf8');
define('DB_COLLATE', );

Now try connecting to the database directly using that information:

youruser@server:~/blog.somesite.com$ mysql -u your_dbuser -p -h mysql.yourhostname.com your_dbname
Enter password:

If you get output like this:

Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 343144 to server version: 5.0.67-userstats-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>

Then that means the connection information is good and something else is going on that's keeping things from working properly. At that point, you should contact support and mention what you did to check the connection information.

If you get output like this:

ERROR 1045 (28000): Access denied for user 'your_dbuser'@'randomdomain.com' (using password: YES)

Then double-check to make sure all your connection credentials are correct. Another way to make sure your hostname is working properly is to try browsing to it in your web browser (http://mysql.yourhostname.com). If it's configured properly that should yield a password prompt for phpMyAdmin. If that doesn't load or loads something other than phpMyAdmin, then there's likely something wrong with the hostname.

These troubleshooting steps will be helpful in any web application though the exact means of obtaining the database information will vary.

Site loads immediately, but displays a 403 Forbidden error

403 Forbidden errors are displayed when a deny rule is set for an IP in a site's .htaccess file or when file permissions keep the web server from serving up a page. In most cases this is file permission-related. To check permissions for your site, login via SSH like this:

~$ ssh youruser@server.dreamhost.com
Password:

Then get a directory listing:

youruser@server:~$ ls -la
drwxr-x--x   16 youruser pg123456    4096 2009-12-10 04:25 ./
...

The first line should look like the above. Notice the permission string that reads "drwxr-x--x". The first letter stands for "directory", then there are three sets of three permissions. The first set are the owner permissions, which are set to read/write/execute. The second set are the group permissions which are set to read/execute. The third set are "other" permissions (or what all other users have), which is set to execute only. If you have Enhanced Security enabled for your user it would look like this instead:

youruser@server:~$ ls -la
drwxr-x---   16 youruser  adm    4096 2009-12-10 04:25 ./
...

The above are correct permission settings. If instead they look like this:

drw-r-----   16 youruser pg123456    4096 2009-12-10 04:25 ./

or this:

d---------   16 youruser pg123456    4096 2009-12-10 04:25 ./

Then that means your user has been disabled and you should contact DreamHost. If this is the case you'll likely see errors like this when attempting to login and won't be able to get the directory listing as described above:

Could not chdir to home directory /home/youruser: Permission denied
-bash: /home/youruser/.bash_profile: Permission denied

If you get this, first wait for approximately 5-8 minutes to see if it gets fixed automatically. If it doesn't, write in to support asking for help.

If permissions look fine, but you're still getting a 403, then try renaming the .htaccess file for your affected domain from ".htaccess" to ".htaccess.disabled" like this:

mv .htaccess .htaccess.disabled

Then try loading up your site. If the 403 is gone, then open up your .htaccess file and look for lines starting with "deny". If you find any, comment them out by putting a "#" before the line and saving the file. You can re-enable the htaccess you disabled above like this:

mv .htaccess.disabled .htaccess

Site loads, but shows a 404 error rather than what you expected

This happens most often with sites that use software like WordPress that use .htaccess rules for their permalinks/pretty URLs. If those rules are removed or changed somehow then a 404 will appear rather than the content you expect. The easiest way to fix this is to download a fresh copy of the software you're using from its website (e.g., http://www.joomla.org/download.html) and then copy the contents of the default .htaccess file and paste it into your existing one (keep in mind that ".htaccess" files are invisible files, so you might need to enable viewing of invisible files on your operating system to find the file if it's there). Not all software comes with a .htaccess file by default. For instance, WordPress generates one when you change your permalinks settings, so keep that in mind as well. If putting in the default htaccess rules doesn't resolve the 404 issue, write in to support and ask for additional help.

When All Sites Are Affected

If the problem isn't related to server load issues, but all your sites are affected, this is almost always because one of your sites is causing problems affecting all the others. In cases like this, it can be hard to know which site is causing the trouble. Here are some tips on how to proceed in those cases.

Checking for Active Processes

This is the first step. Many times you can tell which site is causing trouble by simply checking your active processes. Log into your server via SSH and take a look at which processes are running. Let's say you see something like this:

youruser@server.dreamhost.com:~$ top -c
top - 14:37:35 up 10 days, 17:35,  3 users,  load average: 0.83, 0.89, 1.11
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.7%us,  4.8%sy,  1.7%ni, 56.5%id,  0.9%wa,  0.2%hi,  1.1%si,  0.0%st
Mem:  32966092k total, 32546460k used,   419632k free,  6369232k buffers
Swap:  8000328k total,   228972k used,  7771356k free, 12650516k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                  
 8384 youruser  20   0 66984  11m 6852 S    1  0.0   0:03.62 php5.cgi                                                                                                                                  
 8385 youruser  20   0 66044  10m 6700 S    0  0.0   0:00.24 php5.cgi                                                                                                                                  
10895 youruser  20   0 65940  10m 6848 S    0  0.0   0:00.92 php5.cgi                                                                                                                                  
10917 youruser  20   0 65980  10m 6848 S    0  0.0   0:00.79 php5.cgi                                                                                                                                  
 7542 youruser  20   0 65956  10m 6860 S    0  0.0   0:00.51 php5.cgi                                                                                                                                  
 7818 youruser  20   0 65980  10m 6860 S    0  0.0   0:00.35 php5.cgi                                                                                                                                  
 7828 youruser  20   0 65988  10m 6860 S    0  0.0   0:00.33 php5.cgi                                                                                                                                  
 7917 youruser  20   0 66016  10m 6860 S    0  0.0   0:00.43 php5.cgi                                                                                                                                  
 8152 youruser  20   0 65976  10m 6856 S    0  0.0   0:04.21 php5.cgi                                                                                                                                  
 8380 youruser  20   0 65932  10m 6848 S    0  0.0   0:04.03 php5.cgi                                                                                                                                  
 8386 youruser  20   0 66020  10m 6860 S    0  0.0   0:00.32 php5.cgi                                                                                                                                  
10896 youruser  20   0 65908  10m 6848 S    0  0.0   0:00.66 php5.cgi                                                                                                                                  
10919 youruser  20   0 65948  10m 6848 S    0  0.0   0:00.24 php5.cgi

In the above case, you can see that there are a lot of php5.cgi processes running under your user, but if you have more than one site that's less than helpful. To find out which sites those processes are serving, run this command:

youruser@server.dreamhost.com:~$ lsof -u youruser | grep php | grep home
lsof: WARNING: can't stat() sysfs file system /mnt/root_base/sys
      Output information may be incomplete.
lsof: WARNING: can't stat() proc file system /mnt/root_base/proc
      Output information may be incomplete.
lsof: WARNING: can't stat() tmpfs file system /mnt/root_base/dev
      Output information may be incomplete.
lsof: WARNING: can't stat() nfs file system /mnt/root_base/dev/.static/dev
      Output information may be incomplete.
lsof: WARNING: can't stat() aufs file system /dev/.static/dev
      Output information may be incomplete.
php5.cgi 15082 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 15317 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16812 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16938 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16939 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16942 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 16943 youruser  cwd       DIR   8,17      4096 211839297 /home/youruser/somesite.com/blog
php5.cgi 17050 youruser  cwd       DIR   8,17      4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 17199 youruser  cwd       DIR   8,17      4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 18713 youruser  cwd       DIR   8,17      4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 18717 youruser  cwd       DIR   8,17      4096 210748300 /home/youruser/somesite.com/blog
php5.cgi 20267 youruser  cwd       DIR   8,17      4096 210748300 /home/youruser/somesite.com/blog

You may or may not get the warning messages above. If you do, you can just ignore them. The information you want is below that. It displays all the open files for your php5.cgi processes, which conveniently displays the directory of the site they're serving. You might also see some other sites interspersed there, but usually you'll find that it's mostly one site. For the case of this example it's a WordPress blog.

Finding Your Busiest Sites

Sometimes the above method doesn't work well, which means you might need to investigate your busiest sites since those will likely be the ones causing the most trouble. The easiest way to find out which sites are your busiest is by checking the access.log file for each site -- the larger it is, the more traffic it's getting (they're rotated daily). To do this, log into your account via SSH and run this command from your user's home directory:

youruser@server.dreamhost.com:~$ ls -laSh logs/*/http/access.log | grep "[KGM] " | awk '{split($8,d,"[/]"); print $5 "\t" d[2]}'
102M	somesite.com
41M	example.com
24M	subdomain.example.com
83K	acme-example.org
38K	somewhere.info
10K	nowhere.com
8.9K	test.somesite.com
1.2K	test.somewhere.info

In this case, you can see that somesite.com has the largest log by far, followed by example.com and subdomain.example.com. Those are the domains you'll want to focus on looking at.

Investigating the Site

Checking the Version of Your Software

At this point, you've identified the site in question. The first thing you always want to do is check to make sure it's running the most recent version of whatever software it's running. In WordPress' case, you would do that by running this command inside the directory WordPress is installed in:

youruser@server.dreamhost.com:~/somesite.com/blog$ cat wp-includes/version.php
<?php
/**
 * This holds the version number in a separate file so we can bump it without cluttering the SVN
 */

/**
 * The WordPress version string
 *
 * @global string $wp_version
 */
$wp_version = '2.6.1';

/**
 * Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.
 *
 * @global int $wp_db_version
 */
$wp_db_version = 8204;

?>


Checking for Database Table Overhead

Many web applications seem to have trouble if your database tables develop overhead. Database table overhead shouldn't cause problems, but its been demonstrated that it does in many cases for whatever reason. In fact, with WordPress 2.9 they introduced a new option to make your site automatically check for this. To manually check, browse to the MySQL hostname you're using in your site to get to the phpMyAdmin interface. Select the database your site is using from the dropdown menu at the top left of the page and you should see something like this:

Adv troubleshooting phpmyadmin.jpg

Notice the column on the far right titled "overhead". Rows with values are tables that have overhead. Just click the "Check tables having overhead" link at the bottom (this will only show up if some tables actually have overhead), then select "Optimize table" from the dropdown to the right of that link.

Checking Your Addons

Next you'll want to make sure you have some kind of caching addon installed and from there you'll want to check

Taking Action

Disabling an Old Potentially Hacked Site

If the version of WordPress was really old, it has likely been compromised (hacked). As a stop gap to fix this before upgrading the site to permanently fix it, rename the web directory to disable it:

youruser@server.dreamhost.com:~$ mv somesite.com somesite.com_disabled

Then kill all open PHP processes to clear out the hung processes:

youruser@server.dreamhost.com:~$ pkill -u youruser -f php

You might need to run that a few times to make sure everything gets cleared out. Check top -c to make sure they got cleared out:

youruser@server.dreamhost.com:~$ top -c
top - 14:37:35 up 10 days, 17:35,  3 users,  load average: 0.83, 0.89, 1.11
Tasks:  16 total,   1 running,  15 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.7%us,  4.8%sy,  1.7%ni, 56.5%id,  0.9%wa,  0.2%hi,  1.1%si,  0.0%st
Mem:  32966092k total, 32546460k used,   419632k free,  6369232k buffers
Swap:  8000328k total,   228972k used,  7771356k free, 12650516k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

Now check to see if your sites are loading properly and monitor top to make sure PHP processes aren't building back up again.