Character sets, encoding, and collation

Concepts

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Suppose we have the following alphabet:

A,B,C,a,b,c

We then assign a value to each symbol in the alphabet.

A=1, B=2, C=3, a=4, b=5, c=6

The numbers 1..6 are encodings for the symbols A..c. When we compare two strings, we compare encodings. For example, we can define that A is less than B, C, b, c but equal to a. This rule is an application of collation. Collation is a set of rules to compare encodings.

Character sets and collation in MySQL

To see the character set in MySQL, use

mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 | 
| dec8     | DEC West European           | dec8_swedish_ci     |      1 | 
| cp850    | DOS West European           | cp850_general_ci    |      1 | 
| hp8      | HP West European            | hp8_english_ci      |      1 | 
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 | 
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 | 

Too see collation in MySQL, use

mysql> show collation;
+----------------------+----------+-----+---------+----------+---------+
| Collation            | Charset  | Id  | Default | Compiled | Sortlen |
+----------------------+----------+-----+---------+----------+---------+
| big5_chinese_ci      | big5     |   1 | Yes     | Yes      |       1 | 
| big5_bin             | big5     |  84 |         | Yes      |       1 | 
| dec8_swedish_ci      | dec8     |   3 | Yes     | Yes      |       1 | 
| dec8_bin             | dec8     |  69 |         | Yes      |       1 | 
| cp850_general_ci     | cp850    |   4 | Yes     | Yes      |       1 | 
| cp850_bin            | cp850    |  80 |         | Yes      |       1 | 

Default settings for character sets and collations

There are default settings for character sets and collations at four levels: server, database, table, and column. Defaults can be set and modified at each level.

Server level character sets and collations
MySQL Server has a server character set and a server collation. These can be set at server startup on the command line or in an option file and changed at runtime. They can be configured as follows:

shell> ./configure --with-charset=latin1 --with-collation=latin1_german1_ci

The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. The current server character set and collation can be determined from the values of the character_set_server and collation_server system variables. These variables can be changed at runtime.

To see server variables,

mysql> show variables like '%server%';

Database level character sets and collations
Every database has a database character set and a database collation. The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE db_name [[DEFAULT] CHARACTER SET charset_name] [[DEFAULT] COLLATE collation_name]
ALTER DATABASE db_name [[DEFAULT] CHARACTER SET charset_name] [[DEFAULT] COLLATE collation_name]

Example:

CREATE DATABASE db_name CHARACTER SET latin1 COLLATE latin1_swedish_ci;

To view database level character set and collation,

USE dbname;
show variables like "character_set_database";
show variables like "collation_database";

Alternately,

mysql> show create database orthodb;
+----------+--------------------------------------------------------------------+
| Database | Create Database |
+----------+--------------------------------------------------------------------+
| orthodb | CREATE DATABASE `orthodb` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+--------------------------------------------------------------------+
1 row in set (0.00 sec)

Don't worry, this will not create a new database replacing your existing one.

Table level character sets and collations
Every table has a table character set and a table collation. The CREATE TABLE and ALTER TABLE statements have optional clauses for specifying the table character set and collation:

CREATE TABLE tbl_name (column_list) [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]]
ALTER TABLE tbl_name [[DEFAULT] CHARACTER SET charset_name] [COLLATE collation_name]

To view character set and collation, use

describe tablename;

Column level character sets and collations
Every “character” column (that is, a column of type CHAR, VARCHAR, or TEXT) has a column character set and a column collation. Column definition syntax has optional clauses for specifying the column character set and collation:

col_name {CHAR | VARCHAR | TEXT} (col_length) [CHARACTER SET charset_name] [COLLATE collation_name]

To view character set and collation, use

describe tablename;

To change the character set (and collation) for all columns in an existing table, use:

ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name [COLLATE collation_name];

Source

dev.mysql.com