MySQLでutf8mb4を扱う場合は必要なカラムと接続時だけ明示的に指定する

April 7, 2014 - mysql

結論はタイトルの通り。

まとめ

  • systemのエンコードがcharacter_set_systemに反映される. clientの接続時のデフォルトはシステムのエンコードが使用される(未検証)
  • clientの接続時の指定が優先される。ただし、character_set_database, character_set_filesystem, character_set_serverについては上書きされない。
  • my.cnfのmysql.default-character-setは意味ある?(未検証)
  • my.cnfのmysqld.character-set-serverはfilesystemとsystem以外のデフォルトに影響する。clientの接続時に無指定の場合にもデフォルトとして利用される

以下の理由によって、デフォルトはutf8で、必要なカラムとクライアント接続時にだけutf8mb4を適用することとした。

  • インデックスに使える文字数がutf8だと255文字までに対してutf8mb4だと191文字まで。
    • サーバー側の設定はデフォルトutf8.クライアントの設定はデフォルトutf8mb4.が良い

検証

server/clientともにremiのMySQL5.5を使用

$ mysql --version
mysql  Ver 14.14 Distrib 5.5.37, for Linux (x86_64) using readline 5.1

utf8なカラムname1と、utf8mb4なカラムname2を用意.

$ mysql -uroot test -e 'show create table user2\G'
*************************** 1. row ***************************
       Table: user2
Create Table: CREATE TABLE `user2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name1` varchar(255) DEFAULT NULL,
  `name2` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8

デフォルトの状態

$ mysql -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

clientが--default-character-set=utf8mb4をつけて接続した場合

$ mysql --default-character-set=utf8 -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

my.cnfにmysql.default-character-setを指定した場合

[mysql]
default-character-set=utf8
$ mysql -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

my.cnfにmysqd.character-set-server=utf8を指定した場合

[mysqld]
character-set-server=utf8
$ mysql -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Rails(ActiveRecord)から接続する場合 

db:
  encoding: "utf8mb4" # SET NAMES "utf8mb4"と等価

さらに、SET NAMES utf8mb4は以下全てを実行した場合と等価

SET character_set_client = utf8mb4;
SET character_set_results = utf8mb4;
SET character_set_connection = utf8mb4;

utf8なカラムの文字列検索

  • utf8mb4なカラムには、utf8mb4で接続してinsertすること
  • utf8なカラムには、utf8でもutf8mb4でもどちらでinsertしてもselectしても問題ない
$ mysql -uroot test -e 'set names "utf8mb4"; select * from user2 where name2 like "%だよ%" limit 3'
+----+----------------+------------------+
| id | name1          | name2            |
+----+----------------+------------------+
|  1 | firstだよ♬     | firstだよ♬       |
|  2 | secondだよ     | secondだよ????   |
|  3 | secondだよ?    | secondだよ♨      |
+----+----------------+------------------+
$ mysql -uroot test -e 'set names "utf8mb4"; select * from user2 where name2 like "%だよ♨" limit 3'
+----+--------------------+---------------------+
| id | name1              | name2               |
+----+--------------------+---------------------+
|  3 | secondだよ?        | secondだよ♨          |
|  8 | テストだよ♬        | テストだよ♨          |
| 13 | テストだよ♬        | テストだよ♨          |
+----+--------------------+---------------------+

ただし、utf8mb4なカラムに対して4バイト文字列を投げると、utf8で接続した時にエラーになる

$ mysql -uroot test -e 'set names "utf8"; select * from user2 where name2 like "%だよ♨" limit 3'
ERROR 1267 (HY000) at line 1: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'

なので、データとしてutf8mb4が格納される可能性があるDBに対しては、utf8mb4で必ず接続すること。

$ mysql -uroot test -e 'set names "utf8mb4"; select * from user2 where name2 like "%だよ♨" limit 3'
+----+--------------------+---------------------+
| id | name1              | name2               |
+----+--------------------+---------------------+
|  3 | secondだよ?        | secondだよ♨          |
|  8 | テストだよ♬        | テストだよ♨          |
| 13 | テストだよ♬        | テストだよ♨          |
+----+--------------------+---------------------+

クライアントが5.1の場合

my.cnfやmysqlコマンドのオプションにclient.default_character_set=utf8mb4は指定できない

$ mysql --default-character-set=utf8mb4 -h 192.168.1.164 -uroot test -e 'select * from user'
mysql: Character set 'utf8mb4' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file

ただし、SQLでのset names utf8mb4の指定は有効。

$ mysql -h 192.168.1.164 -uroot test -e 'set names utf8mb4; select * from user'
+----+------------+---------------+
| id | name1      | name2         |
+----+------------+---------------+
| 14 | テスト  | テスト???? |
| 15 | テスト? | テスト♨ |
+----+------------+---------------+

ActiveRecord等でmysql2を使用している場合、encodingにutf8mb4を指定すると例外が起きる

Character set 'utf8mb4' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file
/home/yoshida/hoge/vendor/bundle/ruby/2.0.0/gems/mysql2-0.3.13/lib/mysql2/client.rb:58:in `connect': Can't initialize character set utf8mb4 (path: /usr/share/mysql/charsets/) (Mysql2::Error)
        from /home/yoshida/hoge/vendor/bundle/ruby/2.0.0/gems/mysql2-0.3.13/lib/mysql2/client.rb:58:in `initialize'
        from test_mysql.rb:3:in `new'
        from test_mysql.rb:3:in `<main>'

なので、接続時にset namesすれば良い

require 'mysql2'
client = Mysql2::Client.new(:host => "localhost", :username => "root", db: 'test')
client.query('SET NAMES utf8mb4')
puts client.query('SELECT * FROM user').map { |r| r['name2'] }
$ ruby test.rb
["テスト????", "テスト🌸"]

参考