Character Set Support What is Character Set A character set is a set of symbols and encodings. Suppose that we have an alphabet with four letters: A, B, a, b. We give each letter a number: A = 0, B = 1, a = 2, b = 3. The letter A is a symbol, the number 0 is the encoding for A, and the combination of all four letters and their encodings is a character set.

What is a Collation A collation is a set of rules for comparing characters in a character set.

MySQL includes character set support that enables you to store data using a variety of character sets and perfrom comparisons according to a variety of collations. You can specifiy character sets at the server, database, table and column level. MySQL supports the use of character sets for the MyISAM, MEMORY,a dn InnoDB storage engines.

Performance Issues

MySQL will need tobe compiled from the source to remove unnecessary character sets.

Show available character sets

mysql> show character set;
+----------+---------------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
.................................other rows removed to save space

Notice the length of the character set. utf8 requires 3 bytes per character. So a column with CHAR(32) will require 96 bytes using utf8

Show available collations

mysql> show collation;
+--------------------------+----------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci | big5 | 1 | Yes | Yes | 1 |
| big5_bin | big5 | 84 | | Yes | 1 |
............others removed to save space

or query from the information_schema database

mysql> select * from information_schema.CHARACTER_SETS;
+--------------------+----------------------+---------------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+---------------------------------+--------+
| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 |
| dec8 | dec8_swedish_ci | DEC West European | 1 |
| cp850 | cp850_general_ci | DOS West European | 1 |
| hp8 | hp8_english_ci | HP West European | 1 |
| koi8r | koi8r_general_ci | KOI8-R Relcom Russian | 1 |
........additional row removed

A given character set has at least one collation

mysql> show collation where charset = 'latin1';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
+-------------------+---------+----+---------+----------+---------+

or query it from the information_schema database

mysql> select * from information_schema.collations where character_set_name = 'utf8';
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |

Specifying a Character Set

Use the SET NAMES statement

mysql> SET NAMES 'utf8';

Show Server Character Set mysql> show variables like 'character_set_server'; +----------------------+--------+ | Variable_name | Value | +----------------------+--------+ | character_set_server | latin1 | +----------------------+--------+

Show Server Collation

mysql> show variables like 'collation_server';
+------------------+-------------------+
| Variable_name | Value |
+------------------+-------------------+
| collation_server | latin1_swedish_ci |
+------------------+-------------------+

Set Server Character Set You can specifiy the character set and collation during startup

mysql> set @@global.character_set_server='latin2';
Query OK, 0 rows affected (0.00 sec)

and the character set have been changed

mysql> select @@character_set_server;
+------------------------+
| @@character_set_server |
+------------------------+
| latin2 |
+------------------------+

Database Character Set and Collation Every database has a character set and a collation.

The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE db_name 
 [[DEFAULT] CHARACTER SET charset_name] 
 [[DEFAULT] COLLATE collation_name] 

ALTER DATABASE db_name 
 [[DEFAULT] CHARACTER SET charset_name] 
 [[DEFAULT] COLLATE collation_name]

Example for the test database

mysql> show create database test;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| test | CREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)

Show Database Collation and Character Set

mysql> select @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8 | utf8_general_ci |
+--------------------------+----------------------+

Table Character Set and Collation

Column Character and Collation

results matching ""

    No results matching ""