Character sets in MySQL version 4.1

From DreamHost
Jump to: navigation, search

We've had a few problems with character encodings when we upgraded from MySQL version 4.0 to MySQL 4.1. It appears that sometimes the character encodings don't show up correctly for some applications.

The reason for this is because our MySQL 4.0 only supported the `latin1' character set, while our MySQL 4.1 supports a large variety of character sets. In 4.0, if our customers were storing a non-english (multi-byte character) language in their database (say for example, chinese), the data was still being stored as a latin1 (single-byte character) entry.

However, with the advent of 4.1, it appears that multi-byte characters stored in latin1 tables are sometimes being displayed incorrectly. Additionally, sometimes it will be displayed correctly in one application (say, your web application), but then not be displayed properly in another (say phpmyadmin).

The solutions for this vary from customer to customer, and hence it's best to try these out on your own as much as possible.

Please also be sure to read the official MySQL documentation linked to below.

Additionally, if you are working with non-english character sets, it should already be obvious that you will need to change the character encoding in your browser to the appropriate encoding.

Troubleshooting Steps

1) First, do *not* perform any of the following tests on live data! It can corrupt your data permanently! Make sure that you perform all your tests on backup copies.

To make a copy of one of your tables (named tablename in this example), run a query similar to the following:
(WARNING: This is just for testing purposes! The copied table will not be a legitimate replacement since PRIMARY KEYs, AUTO_INCREMENTS, and sometimes column types or column names are not guaranteed to be the same.)
CREATE TABLE tablename_copy SELECT * FROM tablename;

2) Try altering your data to BINARY. Say you have a column `columnname` of type VARCHAR(50) inside a table `tablename`. Then you would do the following:

ALTER TABLE tablename MODIFY columnname BINARY(50);

Load your test site and see if this fixes the problem. This approach has fixed some of our customer's data. Note that this unfortunately (from what I can tell) does not work on "text" or "blob" fields.

3) After altering your data to BINARY, alter it to the correct character set.

ALTER TABLE tablename MODIFY columnname BINARY(50);
ALTER TABLE tablename MODIFY columnname VARCHAR(50) CHARACTER SET big5;

Load your test site and see if this fixes the problem. This approach has fixed some of our customer's data.

4) Do *not* attempt to directly convert your data from latin1 directly to your new character set. This will more than likely corrupt your data.

5) Dump your data our with mysqldump. Create new tables with the proper character sets. Upload your dumped data.

6) If none of these approaches works, contact us and we can downgrade you temporarily to 4.0 to allow you some time to retrieve your data and convert it to a new character set. Please contact us quickly, as we no longer support 4.0 on the shared environment.

Additional Information