Most people don’t start looking for “how to convert my WordPress character encoding from Latin1 to UTF8″. Nope, not at first, but... continue reading
Most people don’t start looking for “how to convert my WordPress character encoding from Latin1 to UTF8″. Nope, not at first, but they often end up here. Usually it begins something like…
- “what are all these weird characters suddenly showing up all over my blog: Â â €™ ¢?”
- “I just upgraded my WordPress and now I see funny characters all over my posts – what gives?”
- “I just moved my blog to a new web host and now I see funny characters all over my posts – help”
I know, because we’ve been there… searching everywhere for a way to convert the data from one character set to another.
And most of the solutions we’ve found online have not been very helpful… stabs in the dark on average. The notable exceptions were these 2 very helpful posts in a sea of really scary guesses… MySQL latin1 → utf8 (WordPress upgrade) and Fixing mixed-encoding MySQL dumpfiles with WordPress.
The real problem is, I was asking the wrong question, so I kept finding solutions that didn’t work. Until now.
So let’s take a step back and better understand the problem…
What causes those funny characters in your WordPress posts upon upgrade / migration / whatever?
It’s a character encoding thing, and I won’t go into too much detail on it, but suffice to say: WordPress stores your content in a MySQL database, encoded in a character set called UTF8.
But if you installed your WordPress site before WordPress 2.3 (yeah, a while back), at that time WordPress didn’t specify that default character set when creating its storage tables (posts, etc.)… so the tables were created with whatever the default character set was in the MySQL installation. As it happens, back then it was Latin1.
“Yeah, so what” you say. Well, WordPress is still expecting the content to be in UTF8, which is why you’ll occasionally see those funny characters on your blog… Â â €™ ¢?
Sure, there’s a “fix” for that (read: duct tape patch)… you can just tell WordPress in it’s wp-config.php file that your data is stored in Latin1, and those funny characters won’t show any more.
But you’ve still got a database created with Latin1 encoding, storing your UTF8 data. Which in some cases will start causing you problems – such as when we tried to upgrade to WordPress 3 and enable WordPress MultiSite, and it will fail over and over again with no discernible error messages.
So how do you convert that data from Latin1 to UTF8?
Trick is, the content is (and still was back then) stored/encoded in UTF8… but MySQL “thinks” it’s in Latin1 because that’s how the tables were created.
So the better question is… how can you extract that UTF8 data from your MySQL database without corrupting it when MySQL dutifully attempts to convert it to UTF8 for you on export… essentially attempting to convert something that doesn’t need converting.
So how do you fix the Latin1 encoding, a.k.a. convert it to UTF8?
This situation can be fixed quite easily with just a few Linux commands performed on the server. Yes, it could technically be done through CPANEL & phpMyAdmin and using Windows tools to edit your data, but that’s far more likely to corrupt it. Your data is much less likely do get corrupted in this way.
Note: the fix below assumes at least a very basic familiarity with Linux (UNIX) and the ability to connect to your server command line (Telnet or ideally SSH) and run a few basic command.
“Whoa, what’s that?”, you say. Never fear, if you are not comfortable doing this, it’s likely that your web host will do it for you – the procedure is very quick and simple for someone familiar with these commands.
Also, if done correctly the approach below will create a whole new “clean” database, it will NOT change/fix/mangle your current database. That means you’ll be able to literally just edit the WordPress wp-config.php and “point” your WordPress from the old database to the fresh, clean new one. And you should be good to go.
That’s the tricky part.
Disclaimers: The following assumes at least a basic understanding of MySQL and WordPress config files and export functions. There is absolutely no warranty or guarantee of accuracy with anything you read here (or pretty much anywhere, lol). You and only you are responsible for any damage to your database, WordPress installation, mental health, ego, etc. as a results of following these or any other suggested directions. Also, look both ways before crossing the street, and be advised the beverage you are about to enjoy may be hot, and never spit up-wind.
The actual step-by-step WordPress / MySQL fix…
1. Back up all your stuff first (likely using phpMyAdmin / CPANEL)
Before doing any of the following it strongly encouraged to back up all your data and files. Just to be safe. At the very least, your full database and the WordPress config file: wp-config.php
2. Note the settings that your WordPress is currently using (from wp-config.php)
Specifically, your MySQL database configuration, including DB_NAME, DB_USER, DB_PASSWORD, and also DB_CHARSET and DB_COLLATE
// ** MySQL settings - You can get this info from your web host ** // /** The name of the database for WordPress */ define('DB_NAME', 'database_name_here'); /** MySQL database username */ define('DB_USER', 'username_here'); /** MySQL database password */ define('DB_PASSWORD', 'password_here');
3. Create a new MySQL database and create a new user with full privileges to that new database (both using phpMyAdmin)
After you export and “clean” you current data, you’re going to import it into this new database. Then you’ll be using this new database going forward.
4. Export your current WordPress database
The following exports all your data into a single text file (exp-my-database-latin1.sql).
mysqldump --opt --default-character-set=latin1 --skip-extended-insert --user my_db_user_here --password my_db_name_here -r exp-my-database-latin1.sql --log-error=log-mysqldump-my-database.txt
5. Change the character set in the output file (to the correct one)
replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < exp-my-database-latin1.sql > exp-my-database-utf8.sql
6. Run a compare script to sanity check your changes
diff exp-my-database-latin1.sql exp-my-database-utf8.sql
7. Import the cleaned data into the NEW database
mysql --default-character-set=utf8 --user=my_db_user_here --password='my_password_goes_here' my_db_name_here < exp-my-database-utf8.sql
8. Update your WordPress config to point at the shiny, clean, new database
Just update your wp-config.php file with the database name, login, and password of the clean, shiny new database.
Also, update your encoding settings in wp-config.php to reflect the new UTF character set:
/** Database Charset to use in creating database tables. */ define('DB_CHARSET', 'utf8'); /** The Database Collate type. Don't change this if in doubt. */ define('DB_COLLATE', '');
9. Test your changes… browse your website thoroughly and look for those funny characters
If it didn’t “work”, you’ll have funny characters all over your posts; they’ll be quite obvious as you browse 10 to 20 posts looking for them.
That’s it, you should now be able to pull up your website and enjoy the yummy UTF8 goodness, no more Latin1.
IF for whatever reason you need to back out this change, that’s very easy to do… just point your WordPress back to the original database (that you didn’t mess with).
To do that, just edit your wp-config.php file and restore the information you noted in step 1: DB_NAME, DB_USER, DB_PASSWORD, and also DB_CHARSET and DB_COLLATE. Then bring up your website and all should be good again.