背景
- MySQL5.7.21
- Centos 7.4
- innodb_force_recovery=0
信息收集
- MySQL物理备份报错
190425 11:58:54 >> log scanned up to (174805994673)
190425 11:58:55 >> log scanned up to (174805995101)
524288000 of 524288000 100% in 9s 55.32 MB/s done
190425 11:58:56 >> log scanned up to (174805995156)
190425 11:58:57 >> log scanned up to (174805995156)
190425 11:58:58 >> log scanned up to (174805995165)
190425 11:58:59 >> log scanned up to (174805995485)
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
190425 11:59:00 >> log scanned up to (174805995593)
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Error: failed to read page after 10 retries. File ./sbtest/sbtest4.ibd seems to be corrupted.
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.
upload: '<stdin>' -> 's3://mysql-test-87771/1556164635.tar' [part 10, 331MB]
347810779 of 347810779 100% in 6s 54.92 MB/s done
Connection to mysql-test-877710-0.mysql-test-877710-headless.default.svc.clust closed.
- 通过xtrabackup日志可以发现一个重要的报错信息:
File ./sbtest/sbtest4.ibd seems to be corrupted.
- 手动到MySQL查询检查表sbtest4的,发现MySQL会重启
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 0 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> use sbtest
Database changed
mysql> check table sbtest.sbtest4;
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
- 查看error.log,发现MySQL异常崩溃,并重新启动
2019-04-25T15:41:51.843625+08:00 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=135, page number=143665]. You may have to recover from a backup.
2019-04-25T15:41:51.843680+08:00 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
len 16384; hex
...
InnoDB: End of page dump
2019-04-25T15:49:21.251992+08:00 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3293893214, calculated checksums for field1: crc32 3293893214/4282163675, innodb 271698160, none 3735928559, stored checksum in field2 3013555396, calculated checksums for field2: crc32 3293893214/4282163675, innodb 1359747787, none 3735928559, page LSN 40 340032547, low 4 bytes of LSN at page end 2116856341, page number (if stored to page already) 143665, space id (if created with >= MySQL-4.1.1 and stored already) 135
InnoDB: Page may be an index page where index id is 140
2019-04-25T15:49:21.252039+08:00 0 [Note] InnoDB: Index 140 is `PRIMARY` in table `sbtest`.`sbtest4`
2019-04-25T15:49:21.252048+08:00 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
...
- 通过一系列日志收集,可以发现innodb_force_recovery=0时,index page对MySQL的启动无影响,但是有SQL查询到坏页的时候,MySQL会出现崩溃的情况
原因分析
导致page损坏的原因
- 硬件问题
- 驱动程序错误
- 内核错误
- 电源故障
- 罕见的MySQL错误
参数解析
- innodb_force_recovery默认为0(没有强制恢复的正常启动)。
- 仅在紧急情况下将innodb_force_recovery设置为大于0的值,以便您可以启动InnoDB并转储表。在此之前,请确保您拥有数据库备份,以备需要重新创建时使用。值为4或更高可能会永久损坏数据文件。强制InnoDB恢复时,应始终以innodb_force_recovery = 1开始,并且只在必要时逐步增加值。
- 如果您能够以innodb_force_recovery值为3或更低的值转储表,那么您相对安全,只有损坏的单个页面上的某些数据会丢失。值为4或更高被认为是危险的,因为数据文件可能会永久损坏。值6被认为是激烈的,因为数据库页面处于过时状态,这反过来可能会在B树和其他数据库结构中引入更多损坏。
-
innodb_force_recovery可以设置为6个非零值:1~6,大的数字包含了前面所有小数字的所有功能,具体情况如下:
- 1(SRV_FORCE_IGNORE_CORRUPT):忽略检查到的corrupt页。即使检测到损坏的页面,也允许服务器运行。尝试使SELECT * FROM tbl_name跳过损坏的索引记录和页面,这有助于转储表。
- 2(SRV_FORCE_NO_BACKGROUND):防止主线程和任何清除线程运行。如果在清除操作期间发生崩溃,则此恢复值会阻止它。
- 3(SRV_FORCE_NO_TRX_UNDO):崩溃恢复后不运行事务回滚。
- 4(SRV_FORCE_NO_IBUF_MERGE):阻止插入缓冲区合并操作。如果它们会导致崩溃,则不会这样做。不计算表统计信息。此值可能会永久损坏数据文件。使用此值后,请准备删除并重新创建所有二级索引。将InnoDB设置为只读。
- 5(SRV_FORCE_NO_UNDO_LOG_SCAN):启动数据库时不查看撤消日志(undo log):InnoDB甚至将未完成的事务视为已提交。此值可能会永久损坏数据文件。将InnoDB设置为只读。
- 6(SRV_FORCE_NO_LOG_REDO):不执行与恢复相关的重做日志前滚。此值可能会永久损坏数据文件。使数据库页面处于过时状态,这反过来可能会在B树和其他数据库结构中引入更多损坏。将InnoDB设置为只读。
需要注意的是,当设置参数innodb_force_recovery大于0后,可以对表进行select、create、drop操作,但insert、update或者delete这类操作是不允许的。如果innodb_force_recovery设置为4或更高,则将InnoDB置于只读模式。
您可以从表中进行SELECT以转储它们。如果innodb_force_recovery值为3或更小,您可以DROP或CREATE表。 Innodb_force_recovery值也大于3,支持DROP TABLE,最高可达MySQL 5.7.17。从MySQL 5.7.18开始,不允许在innodb_force_recovery值大于4的情况下使用DROP TABLE。
如果您知道给定的表导致回滚崩溃,则可以删除它。如果遇到由大量导入失败或ALTER TABLE导致的失控回滚,则可以终止mysqld进程并将innodb_force_recovery设置为3以使数据库无需回滚,然后DROP导致失控回滚的表。
如果表数据中的损坏阻止您转储整个表内容,则具有ORDER BY primary_key DESC子句的查询可能能够在损坏的部分之后转储表的一部分。
如果启动InnoDB需要高innodb_force_recovery值,则可能存在可能导致复杂查询(包含WHERE,ORDER BY或其他子句的查询)失败的数据结构损坏。在这种情况下,您可能只能运行基本的SELECT * FROM t查询。
- 设置innodb_force_recovery=1时,对表操作
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 1 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> check table sbtest4;
+----------------+-------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+-------+----------+----------+
| sbtest.sbtest4 | check | status | OK |
+----------------+-------+----------+----------+
1 row in set (1 min 4.94 sec)
mysql> select count(*) from sbtest4;
+----------+
| count(*) |
+----------+
| 10997913 |
+----------+
1 row in set (3.58 sec)
mysql> drop table sbtest4_bak;
Query OK, 0 rows affected (0.21 sec)
mysql,> create table sbtest4_bak like sbtest4;
Query OK, 0 rows affected (0.45 sec)
mysql> insert into sbtest4_bak select * from sbtest4;
ERROR 1881 (HY000): Operation not allowed when innodb_forced_recovery > 0.
mysql> alter table sbtest4_bak engine=myisam MAX_ROWS=10000000;
ERROR 1025 (HY000): Error on rename of './sbtest/sbtest4_bak' to './sbtest/#sql2-1-1daa' (errno: 190 - Operation not allowed when innodb_forced_recovery > 0)
- 在参数innodb_force_recovery为1时,可以对表进行select、create、drop操作,但insert、update或者delete这类操作是不允许的。
问题修复
方案1:针对于myisam引擎表
mysql> repair table sbtest4;
方案2:针对于innodb引擎表,损坏非二级索引
#### 建立一张新表,并修改存储引擎为myisam
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 0 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> create table sbtest4_bak like sbtest4;
mysql> alter table sbtest4_bak engine = myisam MAX_ROWS=10000000;
备注:
1、为什么修改为myisam存储引擎?是因为在innodb_force_recovery>0时,innodb无法执行insert操作。或者在innodb_force_recovery>0直接手动create table xxx ... engine=myisam.
2、在行指针设置较小不够用的时候,为提高MyISAM表最大容量,可以调整表定义设定MAX_ROWS的值
#### 损坏页数据导入
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 1 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> insert into sbtest4_bak select * from sbtest4;
#### 删除原表
mysql> drop table sbtest4;
#### 新表重命名
mysql> rename table sbtest4_bak to sbtest4;
注意:在执行insert into xxx select * from xxx;可能会引发问题:
mysql> insert into sbtest4_bak select * from sbtest4;
ERROR 1598 (HY000): Binary logging not possible. Message: Either disk is full or file system is read only while rotating the binlog. Aborting the server.
错误日志显示:
...
2019-04-26T14:04:12.047240+08:00 59 [ERROR] mysqld: Table './sbtest/sbtest4_bak' is marked as crashed and should be repaired
2019-04-26T14:04:12.068042+08:00 59 [Warning] Checking table: './sbtest/sbtest4_bak'
...
避免措施是关闭binlog:
mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into sbtest4_bak select * from sbtest4;
Query OK, 10997913 rows affected (2 min 0.96 sec)
Records: 10997913 Duplicates: 0 Warnings: 0
方案3:针对innodb引擎表,损坏二级索引page
mysql> OPTIMIZE TABLE sbtest4;
- 针对二级索引损坏page可以直接选择OPTIMIZE TABLE xxx;操作直接重新创建二级索引
方案4:物理备份表空间传输
mysql> use sbtest;
mysql> CREATE TABLE sbtest4_bak like sbtest4;
mysql> ALTER TABLE sbtest4_bak DISCARD TABLESPACE;
mysql> FLUSH TABLES t FOR EXPORT;
shell> scp /path/to/datadir/test/sbtest4.{ibd,cfg} destination-server:/path/to/datadir/test
shell> chown -R mysql:mysql /path/to/datadir/test
mysql> use test;
mysql> UNLOCK TABLES;
mysql> use test;
mysql> ALTER TABLE t IMPORT TABLESPACE;
- 通过物理备份恢复到最近的时间点数据,通过innodb表空间传输技术快速恢复
方案5:物理备份逻辑恢复
shell> mysqldump -uxxx1 -hxxx1 -pxxx1 -B test --tables sbtest4 --single-transaction --master-data=2 --set-gtid-purged=off > /path/sbtest4.sql
shell> mysql -uxxx2 -hxxx2 -pxxx2 < /path/sbtest4.sql
- 通过物理备份恢复到最近的时间点数据,通过mysqldump逻辑恢复
总结
- 数据页面的主键索引(clustered key index)被损坏。这种情况和数据的二级索引(secondary indexes)被损坏相比要糟很多,因为后者可以通过使用OPTIMIZE TABLE命令来修复,但这和更难以恢复的表格目录(table dictionary)被破坏的情况来说要好一些。需要尝试另一个方法:insert into tb select * from ta limit X;甚至是dump出去,再load回来。