Character Set Support
What is Character Set
A character set is a set of symbols and encodings. Suppose that we have an alphabet with four letters: A, B, a, b
. We give each letter a number: A = 0, B = 1, a = 2, b = 3
. The letter A
is a symbol, the number 0
is the encoding
for A
, and the combination of all four letters and their encodings is a character set
.
What is a Collation A collation is a set of rules for comparing characters in a character set.
MySQL includes character set support that enables you to store data using a variety of character sets and perfrom comparisons according to a variety of collations. You can specifiy character sets at the server, database, table and column level. MySQL supports the use of character sets for the MyISAM, MEMORY,a dn InnoDB storage engines.
Performance Issues
MySQL will need tobe compiled from the source to remove unnecessary character sets.
Show available character sets
mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
.................................other rows removed to save space
Notice the length of the character set. utf8
requires 3 bytes
per character. So a column with CHAR(32)
will require 96 bytes
using utf8
Show available collations
mysql> show collation;
+--------------------------+----------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| big5_bin | big5 | 84 | | Yes | 1 |
............others removed to save space
or query from the information_schema database
mysql> select * from information_schema.CHARACTER_SETS;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 |
| dec8 | dec8_swedish_ci | DEC West European | 1 |
| cp850 | cp850_general_ci | DOS West European | 1 |
| hp8 | hp8_english_ci | HP West European | 1 |
| koi8r | koi8r_general_ci | KOI8-R Relcom Russian | 1 |
........additional row removed
A given character set has at least one collation
mysql> show collation where charset = 'latin1';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
+-------------------+---------+----+---------+----------+---------+
or query it from the information_schema database
mysql> select * from information_schema.collations where character_set_name = 'utf8';
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
Specifying a Character Set
Use the SET NAMES statement
mysql> SET NAMES 'utf8';
Show Server Character Set mysql> show variables like 'character_set_server'; +----------------------+--------+ | Variable_name | Value | +----------------------+--------+ | character_set_server | latin1 | +----------------------+--------+
Show Server Collation
mysql> show variables like 'collation_server';
+------------------+-------------------+
| Variable_name | Value |
+------------------+-------------------+
| collation_server | latin1_swedish_ci |
+------------------+-------------------+
Set Server Character Set You can specifiy the character set and collation during startup
mysql> set @@global.character_set_server='latin2';
Query OK, 0 rows affected (0.00 sec)
and the character set have been changed
mysql> select @@character_set_server;
+------------------------+
| @@character_set_server |
+------------------------+
| latin2 |
+------------------------+
Database Character Set and Collation Every database has a character set and a collation.
The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:
CREATE DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
Example for the test database
mysql> show create database test;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
Show Database Collation and Character Set
mysql> select @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8 | utf8_general_ci |
+--------------------------+----------------------+
Table Character Set and Collation
Column Character and Collation