MySQLでutf8mb4を扱う場合は必要なカラムと接続時だけ明示的に指定する
April 7, 2014 - mysql
結論はタイトルの通り。
まとめ
- systemのエンコードが
character_set_system
に反映される. clientの接続時のデフォルトはシステムのエンコードが使用される(未検証) - clientの接続時の指定が優先される。ただし、
character_set_database
,character_set_filesystem
,character_set_server
については上書きされない。 - my.cnfの
mysql.default-character-set
は意味ある?(未検証) - my.cnfの
mysqld.character-set-server
はfilesystemとsystem以外のデフォルトに影響する。clientの接続時に無指定の場合にもデフォルトとして利用される
以下の理由によって、デフォルトはutf8で、必要なカラムとクライアント接続時にだけutf8mb4を適用することとした。
- インデックスに使える文字数がutf8だと255文字までに対してutf8mb4だと191文字まで。
- サーバー側の設定はデフォルトutf8.クライアントの設定はデフォルトutf8mb4.が良い
検証
server/clientともにremiのMySQL5.5を使用
$ mysql --version
mysql Ver 14.14 Distrib 5.5.37, for Linux (x86_64) using readline 5.1
utf8なカラムname1
と、utf8mb4なカラムname2
を用意.
$ mysql -uroot test -e 'show create table user2\G'
*************************** 1. row ***************************
Table: user2
Create Table: CREATE TABLE `user2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name1` varchar(255) DEFAULT NULL,
`name2` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8
デフォルトの状態
$ mysql -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
clientが--default-character-set=utf8mb4
をつけて接続した場合
$ mysql --default-character-set=utf8 -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
my.cnfにmysql.default-character-set
を指定した場合
[mysql]
default-character-set=utf8
$ mysql -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
my.cnfにmysqd.character-set-server=utf8
を指定した場合
[mysqld]
character-set-server=utf8
$ mysql -uroot test -e 'show global variables like "character_set%"; show variables like "character_set%"'
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
Rails(ActiveRecord)から接続する場合
db:
encoding: "utf8mb4" # SET NAMES "utf8mb4"と等価
さらに、SET NAMES utf8mb4
は以下全てを実行した場合と等価
SET character_set_client = utf8mb4;
SET character_set_results = utf8mb4;
SET character_set_connection = utf8mb4;
utf8なカラムの文字列検索
- utf8mb4なカラムには、utf8mb4で接続してinsertすること
- utf8なカラムには、utf8でもutf8mb4でもどちらでinsertしてもselectしても問題ない
$ mysql -uroot test -e 'set names "utf8mb4"; select * from user2 where name2 like "%だよ%" limit 3'
+----+----------------+------------------+
| id | name1 | name2 |
+----+----------------+------------------+
| 1 | firstだよ♬ | firstだよ♬ |
| 2 | secondだよ | secondだよ???? |
| 3 | secondだよ? | secondだよ♨ |
+----+----------------+------------------+
$ mysql -uroot test -e 'set names "utf8mb4"; select * from user2 where name2 like "%だよ♨" limit 3'
+----+--------------------+---------------------+
| id | name1 | name2 |
+----+--------------------+---------------------+
| 3 | secondだよ? | secondだよ♨ |
| 8 | テストだよ♬ | テストだよ♨ |
| 13 | テストだよ♬ | テストだよ♨ |
+----+--------------------+---------------------+
ただし、utf8mb4なカラムに対して4バイト文字列を投げると、utf8で接続した時にエラーになる
$ mysql -uroot test -e 'set names "utf8"; select * from user2 where name2 like "%だよ♨" limit 3'
ERROR 1267 (HY000) at line 1: Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'like'
なので、データとしてutf8mb4が格納される可能性があるDBに対しては、utf8mb4で必ず接続すること。
$ mysql -uroot test -e 'set names "utf8mb4"; select * from user2 where name2 like "%だよ♨" limit 3'
+----+--------------------+---------------------+
| id | name1 | name2 |
+----+--------------------+---------------------+
| 3 | secondだよ? | secondだよ♨ |
| 8 | テストだよ♬ | テストだよ♨ |
| 13 | テストだよ♬ | テストだよ♨ |
+----+--------------------+---------------------+
クライアントが5.1の場合
my.cnfやmysqlコマンドのオプションにclient.default_character_set=utf8mb4
は指定できない
$ mysql --default-character-set=utf8mb4 -h 192.168.1.164 -uroot test -e 'select * from user'
mysql: Character set 'utf8mb4' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file
ただし、SQLでのset names utf8mb4
の指定は有効。
$ mysql -h 192.168.1.164 -uroot test -e 'set names utf8mb4; select * from user'
+----+------------+---------------+
| id | name1 | name2 |
+----+------------+---------------+
| 14 | テスト | テスト???? |
| 15 | テスト? | テスト♨ |
+----+------------+---------------+
ActiveRecord等でmysql2を使用している場合、encodingにutf8mb4を指定すると例外が起きる
Character set 'utf8mb4' is not a compiled character set and is not specified in the '/usr/share/mysql/charsets/Index.xml' file
/home/yoshida/hoge/vendor/bundle/ruby/2.0.0/gems/mysql2-0.3.13/lib/mysql2/client.rb:58:in `connect': Can't initialize character set utf8mb4 (path: /usr/share/mysql/charsets/) (Mysql2::Error)
from /home/yoshida/hoge/vendor/bundle/ruby/2.0.0/gems/mysql2-0.3.13/lib/mysql2/client.rb:58:in `initialize'
from test_mysql.rb:3:in `new'
from test_mysql.rb:3:in `<main>'
なので、接続時にset namesすれば良い
require 'mysql2'
client = Mysql2::Client.new(:host => "localhost", :username => "root", db: 'test')
client.query('SET NAMES utf8mb4')
puts client.query('SELECT * FROM user').map { |r| r['name2'] }
$ ruby test.rb
["テスト????", "テスト🌸"]