一、背景
MySQL 8.0已经成为最受瞩目的版本,增加了许多新特性,如Clone Plugin、Hash Join等等。由于新的MySQL重做日志和数据字典格式,所以推出新的XtraBackup 8.0版本,本文基于MySQL产品在自动搭建备库时遇到问题做出解析。
1.1 测试环境
- MySQL 8.0.18
- xtrabackup 8.0.8
- binlog_format=raw,gtid_mode=on,log-bin=1
- sysbench 1.0.17
1.2 问题来源
某天,QFusion测试人员遇到了MySQL 8.0在备库搭建时报错,需要DBA同学跟进问题。首先通过show slave status查看报错信息,发现从库的sql thread出现异常。
root@localhost : (none) 08:56:40> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: test-58-aa-f8aa500-headless
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000085
Read_Master_Log_Pos: 1751589
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 416
Relay_Master_Log_File: mysql-bin.000050
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758' at master log , end_log_pos 16052967. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 15520378
Relay_Log_Space: 19285319437
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758' at master log , end_log_pos 16052967. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 41796
Master_UUID: 4a8f2de4-1fe1-11ea-a7d3-42549a90e353
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 191216 18:56:49
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758-48217
Executed_Gtid_Set: 4a8f2de4-1fe1-11ea-a7d3-42549a90e353:1-25757
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set (0.00 sec)
通过报错信息中的提示,发现是出现了主键冲突。
root@localhost : (none) 08:57:05> select * from performance_schema.replication_applier_status_by_worker limit 1\G
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction '4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758' at master log , end_log_pos 16052967; Could not execute Write_rows event on table test.sbtest24; Duplicate entry '1529529' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 16052967
LAST_ERROR_TIMESTAMP: 2019-12-16 18:56:49.326384
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION: 4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2019-12-16 17:31:13.653629
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2019-12-16 17:31:13.653629
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2019-12-16 18:56:49.325745
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)
查看当前slave最新的gtid信息
root@localhost : (none) 12:05:14> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000008
Position: 155
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 4a8f2de4-1fe1-11ea-a7d3-42549a90e353:1-25757
1 row in set (0.00 sec)
查看xtrabackup备份时的binlog的pos和gtid信息
[root@test-58-aa-f8aa501-0 mydata]# cat xtrabackup_info
...
binlog_pos = filename 'mysql-bin.000050', position '33110447', GTID of the last change '4a8f2de4-1fe1-11ea-a7d3-42549a90e353:1-25757'
...
此时对比发现,slave的gtid信息和备份完成时的gtid是一致的,那为什么还是出现了主键冲突的情况呢?
此时我们根据pos位点解析master的binlog信息
[root@test-58-aa-f8aa500-0 archive]# mysqlbinlog -vv mysql-bin.000050 --start-position=33110447 | head -20
mysqlbinlog: [Warning] unknown variable 'loose_default-character-set=utf8mb4'.
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 195
#191216 17:31:12 server id 41796 end_log_pos 124 CRC32 0xc9ad2c11 Start: binlog v 4, server v 8.0.18 created 191216 17:31:12
BINLOG '
4E73XQ9EowAAeAAAAHwAAAAAAAQAOC4wLjE4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEwANAAgAAAAABAAEAAAAYAAEGggAAAAICAgCAAAACgoKKioAEjQA
CgERLK3J
'/*!*/;
# at 33110447
#191216 17:31:13 server id 41796 end_log_pos 33110527 CRC32 0x27442dd4 GTID last_committed=15 sequence_number=34 rbr_only=yes original_committed_timestamp=1576488674113763 immediate_commit_timestamp=1576488674113763 transaction_length=1034680
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1576488674113763 (2019-12-16 17:31:14.113763 CST)
# immediate_commit_timestamp=1576488674113763 (2019-12-16 17:31:14.113763 CST)
/*!80001 SET @@session.original_commit_timestamp=1576488674113763*//*!*/;
/*!80014 SET @@session.original_server_version=80018*//*!*/;
/*!80014 SET @@session.immediate_server_version=80018*//*!*/;
SET @@SESSION.GTID_NEXT= '4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25776'/*!*/;
# at 33110527
根据binlog的pos位点,发现下一个事务GTID为4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25776;而备份结束的GTID为4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25757。
此时我们根据pos位点解析master的binlog信息
### 主库查看4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758的执行语句
[root@test-58-aa-f8aa500-0 archive]# mysqlbinlog -vv mysql-bin.000050 | grep -B 50 4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25759
...
### INSERT INTO `test`.`sbtest24`
### SET
### @1=1532200 /* INT meta=0 nullable=0 is_null=0 */
### @2=2512693 /* INT meta=0 nullable=0 is_null=0 */
### @3='89203990588-69103074818-52114821100-93439894362-23847143576-07484173808-05428369926-02975913993-41574550041-26938280355' /* STRING(360) meta=61032 nullable=0 is_null=0 */
### @4='45342050513-61911498937-97819152099-08587625932-60582159210' /* STRING(180) meta=65204 nullable=0 is_null=0 */
### INSERT INTO `test`.`sbtest24`
### SET
### @1=1532201 /* INT meta=0 nullable=0 is_null=0 */
### @2=2516379 /* INT meta=0 nullable=0 is_null=0 */
### @3='16016112465-66688149957-19003543589-12752783330-10140834595-17946818527-54058037200-83940103252-07788820899-59081899782' /* STRING(360) meta=61032 nullable=0 is_null=0 */
### @4='02966014962-81012143426-29059376957-22789496327-25833494842' /* STRING(180) meta=65204 nullable=0 is_null=0 */
### INSERT INTO `test`.`sbtest24`
### SET
### @1=1532202 /* INT meta=0 nullable=0 is_null=0 */
### @2=2707896 /* INT meta=0 nullable=0 is_null=0 */
### @3='16925828637-58090054889-95780407302-10293804842-12632614860-03811213734-46471823942-17333962963-31790381481-07019946586' /* STRING(360) meta=61032 nullable=0 is_null=0 */
### @4='77178758790-70128854370-00264407205-64007208401-22279618099' /* STRING(180) meta=65204 nullable=0 is_null=0 */
# at 16555027
#191216 17:31:11 server id 41796 end_log_pos 16555058 CRC32 0x6f8135d4 Xid = 36324
COMMIT/*!*/;
# at 16555058
#191216 17:31:11 server id 41796 end_log_pos 16555138 CRC32 0x7fcb97ac GTID last_committed=0 sequence_number=17 rbr_only=yes original_committed_timestamp=1576488673657448 immediate_commit_timestamp=1576488673657448 transaction_length=1034680
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1576488673657448 (2019-12-16 17:31:13.657448 CST)
# immediate_commit_timestamp=1576488673657448 (2019-12-16 17:31:13.657448 CST)
/*!80001 SET @@session.original_commit_timestamp=1576488673657448*//*!*/;
/*!80014 SET @@session.original_server_version=80018*//*!*/;
/*!80014 SET @@session.immediate_server_version=80018*//*!*/;
SET @@SESSION.GTID_NEXT= '4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25759'/*!*/;
### 从库查看事务4a8f2de4-1fe1-11ea-a7d3-42549a90e353:25758的数据是否存在
root@localhost : (none) 09:23:04> select * from test.sbtest24 order by id desc limit 10;
+---------+---------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| id | k | c | pad |
+---------+---------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
| 1532202 | 2707896 | 16925828637-58090054889-95780407302-10293804842-12632614860-03811213734-46471823942-17333962963-31790381481-07019946586 | 77178758790-70128854370-00264407205-64007208401-22279618099 |
| 1532201 | 2516379 | 16016112465-66688149957-19003543589-12752783330-10140834595-17946818527-54058037200-83940103252-07788820899-59081899782 | 02966014962-81012143426-29059376957-22789496327-25833494842 |
| 1532200 | 2512693 | 89203990588-69103074818-52114821100-93439894362-23847143576-07484173808-05428369926-02975913993-41574550041-26938280355 | 45342050513-61911498937-97819152099-08587625932-60582159210 |
| 1532199 | 2502576 | 09802368960-91917733453-96239669598-54904648916-77296852719-98016730243-90741158106-18793912721-49864482275-01438682979 | 97383280864-09719424624-52657831677-16997497487-42888128592 |
| 1532198 | 2492769 | 50767886676-93929990978-69338677932-87282029008-76889813973-90185628217-96733941319-68335970853-94711333694-25795217601 | 19336837263-95804875039-06595487043-70523328561-09413324879 |
| 1532197 | 1955396 | 00397406795-74747838176-71394165168-41273836833-69479591787-42154528121-51771653744-67255388362-49886388719-58697178839 | 02673409369-19296574832-28003465943-79301421425-80480578998 |
| 1532196 | 2493231 | 86498404871-42108054116-40071340969-11350042446-06300939909-05201025379-06411857335-03954766664-39537185462-91490507327 | 84818317688-20764928816-31090759998-91731449425-90118225230 |
| 1532195 | 2499593 | 77660688834-45252238334-24137839414-65367466774-19337774130-53076983367-01390519320-71314042514-11226819773-51096244448 | 21997720631-75948312175-91821894792-80556960963-06584097138 |
| 1532194 | 2515138 | 66646860538-72521674543-39777861094-01574256160-18755135370-00487912454-64289722053-52102961561-41410923742-00667455284 | 84070066757-97543385536-80074118200-18008298945-61299807560 |
| 1532193 | 2517655 | 50560105798-09418525134-35753621841-55990140866-27080595470-36899245149-55668180470-33559752487-83646593848-17843254144 | 79647081405-62007589765-85313418965-19437322653-89655822041 |
+---------+---------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+
10 rows in set (0.00 sec)
此时我们根据pos位点解析master的binlog信息确认下一个事务已经执行。
二、问题排查
2.1 前期准备
MySQL 8.0的备份取消了FLUSH TABLES WITH READ LOCK的全局加锁限制,具体备份流程后续会写文章对比;使用performance_schema.log_status表来获取binlog的pos和GTID信息。
首先我们介绍下8.0新增的performance_schema.log_status表的具体内容。
root@localhost : (none):47: > select * from performance_schema.log_status\G
*************************** 1. row ***************************
SERVER_UUID: 9eed5dd0-213c-11ea-a06f-fa9c144d7000
LOCAL: {"gtid_executed": "d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2344524", "binary_log_file": "mysql-bin.000036", "binary_log_position": 8237993}
REPLICATION: {"channels": [{"channel_name": "", "relay_log_file": "mysql-relay-bin.000002", "relay_log_position": 130357303}]}
STORAGE_ENGINES: {"InnoDB": {"LSN": 13536450538, "LSN_checkpoint": 13522598241}}
1 row in set (0.00 sec)
- SERVER_UUID:此服务器实例的服务器UUID。这是只读系统变量server_uuid的唯一值。
-
LOCAL:来自主服务器的日志位置状态信息,以单个JSON对象的形式提供,并具有以下信息:
- binary_log_file:当前二进制日志文件的名称。
- binary_log_position:访问log_status表时的当前二进制日志位置。
- gtid_executed:在访问log_status表时,全局变量gtid_execute的当前值。此信息与binary_log_file和binary_log_position键一致。
-
REPLICATION:一个通道的JSON数组,每个通道具有以下信息:
- channel_name:复制通道的名称。
- relay_log_file:复制通道的当前中继日志文件的名称。
- relay_log_pos:访问log_status表时的当前中继日志位置。
- STORAGE_ENGINES:来自各个存储引擎的相关信息,以JSON对象的形式提供,每个适用的存储引擎都有一个密钥。
2.2 验证show master status的输出信息
evernotecid://232882CB-0E86-4D89-B07A-EA1AE2F94E20/appyinxiangcom/15365034/ENResource/p17613
这里可以看到,在并发场景下,show master status输出的GTID和binlog的pos位点不是对应的。查看代码后发现:binlog的file和pos位点的更新在group commit的flush阶段后,即可对其他会话可见。而GTID是在group commit的commit阶段后可对其他会话可见,所以也就造成了show master status输出的binlog信息和GTID信息非原子性。
三、复现问题
3.1 开始备份
[root@qfusion1 ~]# xtrabackup --defaults-file=/etc/mysql/my.cnf --user=root --host=127.0.0.1 --password=xxx --port=3306 --extra-lsndir=/opt/full_lsn --stream=xbstream --backup --target-dir=./ | sshpass -p'letsg0' ssh root@qfusion2 'cat -> /data/backup/full_`date +%F_%H-%M-%S`.tar'
...
191217 19:02:45 [01] ...done
191217 19:02:45 Finished backing up non-InnoDB tables and files
191217 19:02:45 Executing FLUSH NO_WRITE_TO_BINLOG BINARY LOGS
191217 19:02:46 Selecting LSN and binary log position from p_s.log_status
191217 19:02:46 [00] Streaming /var/lib/mysql/archive/mysql-bin.000035 to <STDOUT> up to position 290601
191217 19:02:46 [00] ...done
191217 19:02:46 [00] Streaming <STDOUT>
191217 19:02:46 [00] ...done
191217 19:02:46 [00] Streaming <STDOUT>
191217 19:02:46 [00] ...done
191217 19:02:46 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
xtrabackup: The latest check point (for incremental): '13470381978'
xtrabackup: Stopping log copying thread at LSN 13522426897.
191217 19:02:46 >> log scanned up to (13522598105)
191217 19:02:48 All tables unlocked
191217 19:02:48 [00] Streaming ib_buffer_pool to <STDOUT>
191217 19:02:48 [00] ...done
191217 19:02:48 Backup created in directory '/root/'
MySQL binlog position: filename 'mysql-bin.000035', position '290601', GTID of the last change 'd3eff916-0e03-11ea-a98d-fa84dc05a100:1-2340959'
191217 19:02:48 [00] Streaming <STDOUT>
191217 19:02:48 [00] ...done
191217 19:02:48 [00] Streaming <STDOUT>
191217 19:02:48 [00] ...done
xtrabackup: Transaction log of lsn (12972859512) to (13526909324) was copied.
191217 19:02:50 completed OK!
3.2 查看xtrabackup的信息
[root@qfusion2 full]# ls
backup-my.cnf ibdata1 mysql-bin.000035 mysql.ibd sbtest test undo_002 xtrabackup_checkpoints xtrabackup_logfile
ib_buffer_pool mysql mysql-bin.index performance_schema sys undo_001 xtrabackup_binlog_info xtrabackup_info xtrabackup_tablespaces
[root@qfusion2 full]# cat xtrabackup_binlog_info
mysql-bin.000035 290601 d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2340959
3.3 查看对应的binlog信息
[root@qfusion2 full]# mysqlbinlog -vv mysql-bin.000035 | grep d3eff916-0e03-11ea-a98d-fa84dc05a100 | tail -10
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340970'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340971'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340972'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340973'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340974'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340975'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340976'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340977'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340978'/*!*/;
SET @@SESSION.GTID_NEXT= 'd3eff916-0e03-11ea-a98d-fa84dc05a100:2340979'/*!*/;
[root@qfusion2 full]# mysqlbinlog -vv mysql-bin.000035 --start-position=290601
mysqlbinlog: [Warning] unknown variable 'loose_default-character-set=utf8'.
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 195
#191217 19:02:45 server id 2433306 end_log_pos 124 CRC32 0x6d85a9e8 Start: binlog v 4, server v 8.0.18 created 191217 19:02:45
# Warning: this binlog is either in use or was not closed properly.
BINLOG '
1bX4XQ8aISUAeAAAAHwAAAABAAQAOC4wLjE4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEwANAAgAAAAABAAEAAAAYAAEGggAAAAICAgCAAAACgoKKioAEjQA
CgHoqYVt
'/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
3.4 开始恢复
[root@qfusion2 full]# xtrabackup --defaults-file=/etc/mysql/my.cnf --prepare --use-memory=5G --target-dir=/data/backup/full/
[root@qfusion2 full]# xtrabackup --defaults-file=/etc/mysql/my.cnf --move-back --target-dir=/data/backup/full/
[root@qfusion2 full]# chown -R mysql:mysql /var/lib/mysql/
[root@qfusion2 full]# mysqld_safe --defaults-file=/etc/mysql/my.cnf &
[root@qfusion2 full]# mysql -uroot -pxxx
root@localhost : (none):19: > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000036
Position: 195
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2340979
1 row in set (0.00 sec)
[root@qfusion2 full]# cat xtrabackup_binlog_info
mysql-bin.000035 290601 d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2340959
备份恢复完成后,我们可以看到,show master status输出的GTID信息与xtrabackup_binlog_info文件的GTID是不一致的,对比产品中遇到主键冲突的问题中的show master status的GTID和与xtrabackup_binlog_info文件的GTID是一致的;可以确认的事情是恢复之后是有过对数据库的操作了。此时应该找下原型预研文档和产品开发人员确认下是否在MySQL恢复之后有其他额外的操作,得到反馈是执行了如下的语句操作:
reset master;
...
set global gtid_purged='xxx'
这些步骤是MySQL 5.7下的搭建备库必要的操作(下面内容有验证),因为在5.7备份是没有binlog的备份,且GTID的持久化是在mysql.gtid_executed表,但是该表不是实时更新的,在主库的更新策略如下:
- 主库开启binlog时,mysql.gtid_executed表会记录GTID信息,且在binlog rotation或者server重启时进行更新。
- 主库未开启binlog时,mysql.gtid_executed表不会记录任何信息。
那么MySQL 8.0下的mysql.gtid_executed表的更新策略是什么样的呢?此时我们可以查询官方文档寻找答案:
简单翻译下:如果启用了二进制日志(log_bin为ON),则从MySQL 8.0.17起对InnoDB存储引擎,mysqld以禁用binlog记录或slave更新日志记录的相同方式更新mysql.gtid_executed表,并存储GTID用于事务提交时的每个事务。但是,在MySQL 8.0.17之前的版本以及其他存储引擎中,仅在切换binlog或重启mysqld时,mysqld将所有写入上一个binlog的事务的GTID写入mysql.gtid_executed表。
我们也可以测试验证下:
root@localhost : test:12: > select * from mysql.gtid_executed;show master status;
+--------------------------------------+----------------+--------------+
| source_uuid | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 1 | 2764676 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2764677 | 2764783 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2764784 | 2764893 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2764894 | 2765008 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765009 | 2765074 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765075 | 2765143 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765144 | 2765185 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765186 | 2765283 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765284 | 2765384 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765385 | 2765485 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765486 | 2765571 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765572 | 2765652 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765653 | 2765746 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765747 | 2765865 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765866 | 2765948 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765949 | 2766028 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2766029 | 2766052 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2766053 | 2766053 |
+--------------------------------------+----------------+--------------+
18 rows in set (0.00 sec)
+------------------+-----------+--------------+------------------+------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+------------------------------------------------+
| mysql-bin.000036 | 450959566 | | | d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2766053 |
+------------------+-----------+--------------+------------------+------------------------------------------------+
1 row in set (0.01 sec)
root@localhost : test:12: > insert into test values(1,1,'1');
Query OK, 1 row affected (0.00 sec)
root@localhost : test:12: > insert into test values(2,2,'2');
Query OK, 1 row affected (0.01 sec)
root@localhost : test:12: > select * from mysql.gtid_executed;show master status;
+--------------------------------------+----------------+--------------+
| source_uuid | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 1 | 2764676 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2764677 | 2764783 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2764784 | 2764893 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2764894 | 2765008 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765009 | 2765074 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765075 | 2765143 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765144 | 2765185 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765186 | 2765283 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765284 | 2765384 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765385 | 2765485 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765486 | 2765571 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765572 | 2765652 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765653 | 2765746 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765747 | 2765865 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765866 | 2765948 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2765949 | 2766028 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2766029 | 2766052 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2766053 | 2766053 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2766054 | 2766054 |
| d3eff916-0e03-11ea-a98d-fa84dc05a100 | 2766055 | 2766055 |
+--------------------------------------+----------------+--------------+
20 rows in set (0.01 sec)
+------------------+-----------+--------------+------------------+------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+------------------------------------------------+
| mysql-bin.000036 | 450960258 | | | d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2766055 |
+------------------+-----------+--------------+------------------+------------------------------------------------+
1 row in set (0.00 sec)
3.5 手动删除备份的binlog文件
[root@qfusion2 full]# ls
backup-my.cnf mysql-bin.000035 sbtest undo_002 xtrabackup_logfile
ib_buffer_pool mysql-bin.index sys xtrabackup_binlog_info xtrabackup_tablespaces
ibdata1 mysql.ibd test xtrabackup_checkpoints
mysql performance_schema undo_001 xtrabackup_info
[root@qfusion2 full]# mv mysql-bin.* ../
[root@qfusion2 full]# ls
backup-my.cnf mysql.ibd test xtrabackup_checkpoints
ib_buffer_pool performance_schema undo_001 xtrabackup_info
ibdata1 sbtest undo_002 xtrabackup_logfile
mysql sys xtrabackup_binlog_info xtrabackup_tablespaces
[root@qfusion2 full]# xtrabackup --defaults-file=/etc/mysql/my.cnf --prepare --use-memory=5G --target-dir=/data/backup/full/
[root@qfusion2 full]# xtrabackup --defaults-file=/etc/mysql/my.cnf --move-back --target-dir=/data/backup/full/
[root@qfusion2 full]# chown -R mysql:mysql /var/lib/mysql/
[root@qfusion2 full]# mysqld_safe --defaults-file=/etc/mysql/my.cnf &
[root@qfusion2 full]# mysql -uroot -pxxx
root@localhost : (none):46: > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000003
Position: 155
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2340959
1 row in set (0.00 sec)
[root@qfusion2 full]# cat xtrabackup_binlog_info
mysql-bin.000035 290601 d3eff916-0e03-11ea-a98d-fa84dc05a100:1-2340959
在手动移除备份生成的相关binlog文件后,发现事务是回滚到xtrabackup_binlog_info记录的GTID的位置。与MySQL 5.7下的备份恢复的情况一致;具体可以查看下面的验证。
四、MySQL 5.7下的物理备份
4.1 测试环境
- MySQL 5.7.28
- xtrabackup 2.4.17
- binlog_format=raw,gtid_mode=on,log-bin=1
- sysbench 1.0.17
4.2 开始备份
[root@qfusion3 archive]# xtrabackup --defaults-file=/etc/mysql/my.cnf --user=root --host=127.0.0.1 --password=abc123 --port=3306 --extra-lsndir=/opt/full_lsn --stream=xbstream --backup --target-dir=./ | sshpass -p'letsg0' ssh root@qfusion4 'cat -> /data/backup/full_`date +%F_%H-%M-%S`.tar'
4.3 开始恢复
[root@qfusion4 full]# cd /data/backup
[root@qfusion4 full]# xbstream -x < full_2019-12-18_14-06-10.tar -C full
[root@qfusion4 full]# cd full
[root@qfusion4 full]# ls
backup-my.cnf ibdata1 performance_schema sys undo002 undo004 undo006 undo008 undo010 undo012 undo014 undo016 xtrabackup_checkpoints xtrabackup_logfile
ib_buffer_pool mysql sbtest undo001 undo003 undo005 undo007 undo009 undo011 undo013 undo015 xtrabackup_binlog_info xtrabackup_info
[root@qfusion4 full]# xtrabackup --defaults-file=/etc/mysql/my.cnf --prepare --use-memory=5G --target-dir=/data/backup/full/
[root@qfusion4 full]# xtrabackup --defaults-file=/etc/mysql/my.cnf --move-back --target-dir=/data/backup/full/
[root@qfusion4 full]# chown -R mysql:mysql /var/lib/mysql/
[root@qfusion4 full]# mysqld_safe --defaults-file=/etc/mysql/my.cnf &
[root@qfusion4 full]# mysql -uroot -pxxx
root@localhost : (none):13: > select * from mysql.gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| 189fb8ac-1fe8-11ea-9b92-faf16b07cf00 | 1 | 535 |
+--------------------------------------+----------------+--------------+
1 row in set (0.01 sec)
root@localhost : (none):13: > show master status\G
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 154
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 189fb8ac-1fe8-11ea-9b92-faf16b07cf00:1-535
1 row in set (0.00 sec)
[root@qfusion4 full]# cat xtrabackup_binlog_info
mysql-bin.000003 175325364 189fb8ac-1fe8-11ea-9b92-faf16b07cf00:1-736
可以看到MySQL 5.7版本的备份恢复后的GTID是依赖于mysql.gtid_executed表,但是该表不是实时更新,此时的GTID其实是滞后的,所以也需要额外执行set global gtid_purged=‘xxx’。
五、总结
- MySQL 8.0下的物理备份生成的GTID值是不可靠,因为备份了最后一个binlog文件,恢复时会recovery到最新的事务。
- MySQL 8.0下的物理备份会执行FLUSH NO_WRITE_TO_BINLOG BINARY LOGS操作,在有大事务对数据进行修改时,一不小心可能就会出现数据库hang死,所以建议使用小事务。