i18n and L10n: 7 - International MySQL Databases
How to configure your MySQL databases to handle Unicode data points
Author: Matthew Wittering | Published: 29th July 2009
MySQL is an Open Source project freely available under the GNU General Public Licence. The application is regularly used by developers using PHP, Perl, Python and Ruby projects as a persistent data store for user generated data. On completion of this weblog I will have explained how to create tables which handle the Unicode character set encoded as UTF-8.
MySQL
MySQL is the darling of the Open Source movement after the community based projects was acquired by Sun Microsystems in Q1 2009. The relational database management system has much potential for developing internationalized (i10n) and localized (L10n) projects. However it does not do this out of the box. We must be aware that the defaults do not automatically satisfy these needs. Luckily this can be rectified quickly.
By default, MySQL comes preconfigured to use the latin1 (cp1252 West European) character set and the latin1_swedish_ci collation for sorting. These choices make no impact on Western Latin based languages however are not suitable for others. The character set and collation should be configured to use uft8 and utf8_general_ci respectively to support all scripts.
The example MySQL statement below to create a new table called "names" will add two columns. The character set will be configured as utf8 allowing all Unicode characters to be represented in the database table.
Example MySQL Statement
CREATE TABLE names ( firstname VARCHAR(50), lastname VARCHAR(50) ) CHARACTER SET utf8 COLLATE utf8_general_ci;
When setting the collation for uft-8 you have the options between utf8_general_ci and utf8_unicode_ci. I suggest using utf8_general_ci for these reasons. For example when comparing Latin glyphs such as À, Á, Å, å, ā or ă all compared as equal to with the A or a. This offers some fuzzy logic to expose more records. Such Andalucia is equal to Andalucía.
Summary
I hope that this post has brought the opportunity to configure the database to accept all Unicode characters to your attention and you will find it useful in your next project.
Links
- http://dev.mysql.com/doc/refman/6.0/en/internationalization-localization.html
- http://forums.mysql.com/read.php?103,187048,188748#msg-188748
This work is licenced under a Creative Commons Licence
I am a graduate of Lougborough University where I read Computing and Management BSc (Hons) earning a 2:1 classification.