GTID 和 START SLAVE
- START SLAVE 语法
START SLAVE [thread_types] [until_option] [connection_options]
thread_types:
[thread_type [, thread_type] ... ]
thread_type:
IO_THREAD | SQL_THREAD
until_option:
UNTIL { {SQL_BEFORE_GTIDS | SQL_AFTER_GTIDS} = gtid_set
| MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos
| RELAY_LOG_FILE = 'log_name', RELAY_LOG_POS = log_pos
| SQL_AFTER_MTS_GAPS }
* SQL_BEFORE_GTIDS = $gitd_set : $gtid_set之前的gtid都会被执行
eg. START SLAVE SQL_THREAD UNTIL SQL_BEFORE_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56
表示,当SQL_thread 执行到3E11FA47-71CA-11E1-9E33-C80AA9429562:10 的时候停止,下一个事务是11
* SQL_AFTER_GTIDS = $gitd_set : $gtid_set之前,以及$gtid_set包含的gtid都会被执行
eg. START SLAVE SQL_THREAD UNTIL SQL_AFTER_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56
表示,当SQL_thread 执行到3E11FA47-71CA-11E1-9E33-C80AA9429562:56 的时候停止,56是最后一个提交的事务。
- 如何从multi-threaded slave 转化成 single-threaded mode
START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
SET @@GLOBAL.slave_parallel_workers = 0;
START SLAVE SQL_THREAD;
GTID 和 upgrade
如果 --gtid-mode=ON ,那么在使用upgrade时候,不推荐使用--write-binlog 选项。
因为,mysql_upgrade 会更新Myisam引擎的系统表. 而同时更新transction table 和 non-trasaction table 是gtid所不允许的
GTID 和 mysql.gtid_executed
- gtid_mode = (ON|ON_PERMISSIVE), bin_log = off
gtid 会实时的写入到mysql.gtid_executed表中,且根据executed_gtids_compression_period=N来压缩
- gtid_mode = (ON|ON_PERMISSIVE), bin_log = on
gtid 不会实时的写入到mysql.gtid_executed,executed_gtids_compression_period会失效。
只有当binlog rotate或者mysql shutdown的时候才会写入mysql.gtid_executed
如果master 异常shutdown,gtid还没有写入到mysql.gtid_executed怎么办呢?
这种场景,一般通过mysql recover机制写入到mysql.gtid_executed中
GTID 和 gtid_next
http://dev.mysql.com/doc/refman/5.7/en/replication-options-gtids.html#sysvar_gtid_next
- 三种取值
* AUTOMATIC: Use the next automatically-generated global transaction ID.
* ANONYMOUS: Transactions do not have global identifiers, and are identified by file and position only.
* A global transaction ID in UUID:NUMBER format.
- QA: GTID 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 对应的事务顺序,从小到大,一定是顺序执行的吗?
答案:错,一般情况下事务是从小到大,顺序执行的。 但是如果再MTS场景,或者是人工设置gtid_next的情况下,就可能不是顺序执行了
dba:(none)> show master status;
+--------------------+----------+--------------+------------------+-------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+--------------------+----------+--------------+------------------+-------------------------------------------+
| xx.000009 | 1719 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46 |
+--------------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)
dba:(none)> set gtid_next='0923e916-3c36-11e6-82a5-ecf4bbf1f518:50';
Query OK, 0 rows affected (0.00 sec)
dba:lc> insert into gtid_1 values(5);
Query OK, 1 row affected (0.00 sec)
dba:lc> set gtid_next=AUTOMATIC;
Query OK, 0 rows affected (0.00 sec)
dba:lc> flush logs;
Query OK, 0 rows affected (0.01 sec)
dba:lc> show master status;
+--------------------+----------+--------------+------------------+----------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+--------------------+----------+--------------+------------------+----------------------------------------------+
| xx.000010 | 210 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46:50 |
+--------------------+----------+--------------+------------------+----------------------------------------------+
1 row in set (0.00 sec)
dba:lc> insert into gtid_1 values(6);
Query OK, 1 row affected (0.00 sec)
dba:lc> insert into gtid_1 values(6);
Query OK, 1 row affected (0.00 sec)
dba:lc> insert into gtid_1 values(6);
Query OK, 1 row affected (0.00 sec)
dba:lc> show master status;
+--------------------+----------+--------------+------------------+-------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+--------------------+----------+--------------+------------------+-------------------------------------------+
| xx.000010 | 1125 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 |
+--------------------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)
在这里面,很明显0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 事务执行顺序为: 1-46(最先执行) , 50(其次执行) , 47-49(最后执行)
GTID 和 MHA
请参考MHA源码解析
- GTID模式下,需要relay-log吗?purge_relay_log设置为on可以吗?
* replication 架构
host_1(host_1:3306) (current master)
+--host_2(host_2:3306 candidate master)
+--host_3(host_3:3306 no candidate)
* 模拟:
1. 大量并发的写入,一直持续的往host_1写数据,造成并发写入很大的样子
2. host_2: stop slave , 造成host_2 延迟master很多的样子
3. host_1: purge binary logs, 造成master删掉了日志,导致host_2 修复的时候拿不到master的最新binlog
4. host_3: 一直正常同步master,拥有最新的binlog
5. host_3: flush logs; purge_relay_log=on; flush logs;一直循环flush logs,造成host_3已经将最新的relay log删掉了,host_2 是肯定拿不到host_3的relay 来修复自己了
6. 好了,一切条件均已经准备完毕,这个时候让master 宕机,这样就能模拟出在relay log没有的情况下,是否可以正常完成mha 切换了
...............
7. 结果完成了正常切换,那mha是怎么再gtid模式下,在没有relay log的情况下,正常切换的恩?
8. 原理:host_2发现自己不是最新的slave,所以就去change master到host_3,通过host_3的binlog来恢复
9. 最后,当host_2和host_3都一致的情况下,再让host_3 重新指向host_2,完毕...
*结论: gtid模式下,mha恢复切换的原理是不需要relay log的,只需要binlog
GTID 和 备份(物理备份+逻辑备份)
物理备份:xtrabackup,其他等
逻辑备份:mysqldump,mydumper,mysqlpump等
- 物理备份
备份的时候,只要在备份的时候记录下Executed_Gtid_Set($gtid_dump)即可,这个可以用于重新change master;
reset master;
SET @@GLOBAL.GTID_PURGED='$gtid_dump';
change master to master_auto_position=1;
- 逻辑备份
* mysqldump 中 sql_log_bin 默认是关闭的。 SET @@SESSION.SQL_LOG_BIN= 0; 所以这里用途非常重要
* 如果dump文件,你要在master上执行,那么必须这样备份: mysqldump xx --set-gtid-purged=OFF , 这样dump文件不会有SET @@SESSION.SQL_LOG_BIN= 0存在
* 如果dump文件,你要在slave上执行,想重新搭建一套slave环境。那么必须这样备份: mysqldump xx --set-gtid-purged=ON
GTID 和 crash safe slave
slave relay log 不完整怎么办?(relay-log-recover=0)
relay-log-recover=1 不考虑,因为它会舍弃掉relay log
- 为何要讨论这个
* 官方解释:
1) 非GTID模式下,如何保证slave crash safe 呢?
relay_log_recovery=1,relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1
2) GTID模式下,如何保证slave crash safe呢?
relay_log_recovery=(1|0),relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1
以上两种情况配置,可以保证crash safe
这里看到区别就是relay_log_recovery了,gtid可以是any,这就需要讨论下了。
当relay_log_recovery=1时,当mysql crash的时候,会丢弃掉之前获取的relay,所以这个不会产生一致性问题。
当relay_log_recovery=0时
如果是非GTID模式,因为没办法保证写master_info.log和relay log file之间的原子性,会导致slave有可能多拉取一个事务,这样就有一致性问题。
如果是GTID模式,因为binlog-dump协议变了,master_info.log已经不用,slave会将已经exected_GTID与retrieve_gtid的并集发送给master,以此来获取没有执行过的gtid,所以没问题。
这里面的retrieve_gtid就是IO_thread从master获取的gtid,会写入到relay log。
- 模拟relay log不完整的情况
从上面可以知道,relay log的记录非常重要,那么relay log 不完整,会怎么样呢?
1) master 创建一张10G的表,然后执行全表更新操作。
2)这时候,slave就在狂写relay log了
3)此时,去slave kill掉mysql进程
4)这时候,relay log就不完整了
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set.
This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement.
The event(s) from the partial statement have not been written to output.
总结: relay log不完整,mysql起来后,会重新获取不完整的这个events,sql_thread在回放的时候,如果发现events不完整,会跳过,不会影响到同步。
GTID 和 MTS
MTS_GAPS
- 如果MTS遇到Gap transction怎么办?
1. 先解决问题
START SLAVE UNTIL SQL_AFTER_MTS_GAPS
2. 考虑设置slave_preserve_commit_order=1
GTID 生产环境中必须考虑的问题
Migration to GTID replication
Non transactionally safe statement will raise errors now
MySQL Performance in GTID
mysql_upgrade script
Errant transactions
Filtration on the slave
Injecting empty transactions
以上问题请参考 GTID原理与实战
GTID 和 online升级
online升级丢数据?
online升级会报错吗?
online升级步骤请参考 GTID原理与实战
- 故障案例一
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate anonymous transaction when @@GLOBAL.GTID_MODE = ON ...'
两种情况:
1)slave的gtid_mode=on时,却还接受着来自master的non-gtid transaction的时候,会报以上错误。
2)事实上,不管slave的gtid_mode是on,还是off,只要master的gtid_mode=on,那么整个replication slave,都必须是gtid的事务
解决方案:在master上从gtid_mode=ON_PERMISSIVE 设置到 gtid_mode=ON之前,如何保证现在所有non-gtid事务都已经在slave执行完毕了?
很简单,两种方法:
第一种方案:
1) 在master上,当设置gtid_mode=ON_PERMISSIVE的时候,其实就已经产生gtid事务了,这个时候show master status;记下这个位置 $pos
2)然后再每个slave上,执行 SELECT MASTER_POS_WAIT(file, position);
第二种更加直接方案:
0)默认情况下,slave的gtid_mode都是off,所以去slave上show master status 都应该是file,position
1) 先在master上,设置gtid_mode=ON_PERMISSIVE
2)然后再每台slave上再次执行show master status,如果发现结果由file,position 变成 GTID_EXECUTED,那么说明slave已经将non-gtid全部执行完毕了
- 故障案例二
Last_IO_Error: The replication receiver thread cannot start because the master has GTID_MODE = ON and this server has GTID_MODE = OFF.
slave的gtid_mode=off时,却还接受着来自master的gtid transaction的时候,会报以上错误。
GTID 和 mysqlbinlog
mysqlbinlog 参数:
* --exclude-gtids : 排除这些gtid
* --include-gtids : 只打印这些gtid
* --skip-gtids : 所有gtid都不打印
可以用--skip-gtids 做传统模式的恢复。但是这个是官方不推荐的。
mysqlbinlog --skip-gtids binlog.000001 > /tmp/dump.sql
GTID 和 重要函数
gtid_set 用引号扩起来
Name | Description |
---|---|
GTID_SUBSET(subset,set) | returns true (1) if all GTIDs in subset are also in set |
GTID_SUBTRACT(set,subset) | returns only those GTIDs from set that are not in subset |
WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout]) | Wait until the given GTIDs have executed on slave. |
WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout) | Wait until the given GTIDs have executed on slave |
- GTID_SUBSET(subset,set)
subset 是否是 set 的子集,如果是返回1,不是返回0
dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57');
+-----------------------------------------------------------------------------------------------------+
| GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') |
+-----------------------------------------------------------------------------------------------------+
| 1 |
+-----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57');
+--------------------------------------------------------------------------------------------------------+
| GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') |
+--------------------------------------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:23');
+--------------------------------------------------------------------------------------------------+
| GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:23') |
+--------------------------------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57');
+--------------------------------------------------------------------------------------------------------+
| GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') |
+--------------------------------------------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
- GTID_SUBTRACT(set,subset)
哪些gtids仅仅是set独有的,subset没有的
dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:21');
+-------------------------------------------------------------------------------------------------------+
| GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:21') |
+-------------------------------------------------------------------------------------------------------+
| 3e11fa47-71ca-11e1-9e33-c80aa9429562:22-57 |
+-------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25');
+----------------------------------------------------------------------------------------------------------+
| GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25') |
+----------------------------------------------------------------------------------------------------------+
| 3e11fa47-71ca-11e1-9e33-c80aa9429562:26-57 |
+----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24');
+----------------------------------------------------------------------------------------------------------+
| GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24') |
+----------------------------------------------------------------------------------------------------------+
| 3e11fa47-71ca-11e1-9e33-c80aa9429562:21-22:25-57 |
+----------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
以上两个函数可以用来干嘛呢?
通过GTID_SUBSET,master可以知道slave是否是自己的子集,可以很方便的检查数据一致性
通过GTID_SUBTRACT,假设slave是master的子集,那么可以很轻松的将slave没有,master有的gtid发送给slave,以便达到最终一致性
- WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout)
timeout 默认为0,表示无限等待slave gtid_set全部执行完毕
如果全部执行完毕,会返回执行的gtid的数量。如果没有执行完,会等待timeout秒。如果slave没有起来,或者没有开启gtid,会返回NULL
dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3');
+---------------------------------------------------------------------------------+
| WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3',1) |
+---------------------------------------------------------------------------------+
| 0 |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
stop slave;
dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3');
+---------------------------------------------------------------------------------+
| WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3',1) |
+---------------------------------------------------------------------------------+
| NULL | ## 如果slave的IO,SQL thread 没有running,返回NULL,不管gtid set 有木有执行完毕
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
- WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout])
含义跟WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS一样,唯一一个区别就是:如果slave 的replication 线程没有起来,不会返回NULL。
stop slave;
dba:lc> SELECT WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3');
+------------------------------------------------------------------------+
| WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3') |
+------------------------------------------------------------------------+
| 0 | ## 如果都执行了,返回0 , 跟slave的IO,SQL thread 起没起来无关
+------------------------------------------------------------------------+
1 row in set (0.00 sec)
dba:lc> SELECT WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4',1);
+--------------------------------------------------------------------------+
| WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4',1) |
+--------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------+
1 row in set (1.00 sec)
GTID 的限制和缺点
- 同事更新nontransactional和transactional的表,会导致gtid问题
- CREATE TABLE ... SELECT statements 语法对GTID来说是不安全的
- CREATE TEMPORARY TABLE and DROP TEMPORARY TABLE 对GTID也是不安全的
- enforce-gtid-consistency 必须设置on,可以避免以上2,3 不安全的statement
- sql_slave_skip_counter 不允许执行,可以通过 Injecting empty transactions 来解决
- GTID 和 mysqldump的问题,mysqldump 中 sql_log_bin 默认是关闭的.会导致导入master后,不会写入gtid到binlog. ( 可以通过 --set-gtid-purged=OFF 避免 )
- GTID and mysql_upgrade, 因为部分系统表是myisam引擎的,会有问题。 (可以通过--write-binlog=off来避免 )
参考文档
- 官方资料:
1.5 Server and Status Variables and Options Added, Deprecated, or Removed in MySQL 5.7
5.5.4 mysqldump — A Database Backup Program
5.6.7 mysqlbinlog — Utility for Processing Binary Log Files
13.17 Functions Used with Global Transaction IDs
14.4.2.1 CHANGE MASTER TO Syntax
14.4.2.6 START SLAVE Syntax
14.7.5.34 SHOW SLAVE STATUS Syntax
18.1.3 Replication with Global Transaction Identifiers
18.1.3.1 GTID Concepts
18.1.3.2 Setting Up Replication Using GTIDs
18.1.3.3 Using GTIDs for Failover and Scaleout
18.1.3.4 Restrictions on Replication with GTIDs
18.1.5.1 Replication Mode Concepts
18.1.5.2 Enabling GTID Transactions Online
18.1.5.3 Disabling GTID Transactions Online
18.1.6.1 Replication and Binary Logging Option and Variable Reference
18.1.6.5 Global Transaction ID Options and Variables
18.3.2 Handling an Unexpected Halt of a Replication Slave
18.4.1.34 Replication and Transaction Inconsistencies
18.4.3 Upgrading a Replication Setup
19.2.1.5 Adding Instances to the Group
24.10.7.1 The events_transactions_current Table
24.10.11.6 The replication_applier_status_by_worker Table
- 第三方资料
> http://www.fromdual.ch/things-you-should-consider-before-using-gtid
> http://www.fromdual.ch/gtid_in_action
> http://www.fromdual.ch/replication-troubleshooting-classic-vs-gtid
> http://www.fromdual.ch/replication-in-a-star
> http://www.fromdual.com/controlling-worldwide-manufacturing-plants-with-mysql
> https://www.percona.com/blog/2014/05/19/errant-transactions-major-hurdle-for-gtid-based-failover-in-mysql-5-6/
> https://www.percona.com/blog/2016/12/01/database-daily-ops-series-gtid-replication-binary-logs-purge/
> https://www.percona.com/blog/2016/11/10/database-daily-ops-series-gtid-replication/
> https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/
> https://www.percona.com/blog/2014/05/09/gtids-in-mysql-5-6-new-replication-protocol-new-ways-to-break-replication/