运行着Innodb 引擎的MySQL 在遇到硬件故障,内核bug 或者是突然断电,MySQL本身的bug,可能会造成Innodb表空间的损坏,提示的信息大致如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
InnoDB: Database page corruption on disk
or
a failed
InnoDB:
file
read of page
7.
InnoDB: You may have to recover
from
a backup.
080703
23
:
46
:
16
InnoDB: Page dump
in
ascii
and
hex
(
16384
bytes):
… A LOT OF
HEX
AND BINARY DATA…
080703
23
:
46
:
16
InnoDB: Page checksum
587461377
, prior
-
to
-
4.0
.
14
-
form checksum
772331632
InnoDB: stored checksum
2287785129
, prior
-
to
-
4.0
.
14
-
form stored checksum
772331632
InnoDB: Page lsn
24
1487506025
, low
4
bytes of lsn at page end
1487506025
InnoDB: Page number (
if
stored to page already)
7
,
InnoDB: space
id
(
if
created with >
=
MySQL
-
4.1
.
1
and
stored already)
6353
InnoDB: Page may be an index page where index
id
is
0
25556
InnoDB: (index “PRIMARY” of table “test”.”test”)
InnoDB: Database page corruption on disk
or
a failed
|
提示出现断言错误(assertion failure);
对于损坏的表,大致可分为三种情况,Data corrupted In Secondary Index, Data corrupted In clustered key,Data corrupted In table dictionary,恢复难度依次增加;
下面我们来做一个简单恢复实验,我手动编辑test.ibd 文件,只修改一小部分,此时,我手动check 一下:
1
2
3
4
5
6
7
8
9
|
mysql> check table test;
ERROR
2013
(HY000): Lost connection to MySQL server during query
mysql> check table test;
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
| Table | Op | Msg_type | Msg_text |
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
| test.test | check | status | OK |
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
1
row
in
set
(
0.69
sec)
|
第一次运行的时候,Innodb发现checksum error并出现crash的情况(运行check 的时候同样会发生crash),第二次运行的时候我设置了 innodb_force_recovery=1,但是check 的结果却是ok,对于check功能还是不要太过相信啊,
在Innodb_force_recovery=1 的情况下,我们可以建一张引擎为MyISAM同样表结构的表,对新表导入数据,然后去掉force recovery选项,重启MyISAM,并将引擎转换为INNODB,:
1
2
3
4
5
6
7
8
9
|
mysql> CREATE TABLE `test2` (
-
> `c` char(
255
) DEFAULT NULL,
-
> `
id
`
int
(
10
) unsigned NOT NULL AUTO_INCREMENT,
-
> PRIMARY KEY (`
id
`)
-
> ) ENGINE
=
MYISAM;
Query OK,
0
rows affected (
0.03
sec)
mysql> insert into test2 select
*
from
test;
Query OK,
229376
rows affected (
0.91
sec)
Records:
229376
Duplicates:
0
Warnings:
0
|
以上的操作确实有点简单,下面我们尝试把test.ibd 的page header部分全部擦除,这次甚至在innodb_force_recovery=1 的情况下,运行check table 同样会出现断言的错误(Assertion failure):
1
2
3
4
5
|
080704
0
:
22
:
53
InnoDB: Assertion failure
in
thread
1158060352
in
file
btr
/
btr0btr.c line
3235
InnoDB: Failing assertion: page_get_n_recs(page) >
0
|| (level
=
=
0
&& page_get_page_no(page)
=
=
dict_index_get_page(index))
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http:
/
/
bugs.mysql.com.
InnoDB: If you get repeated assertion failures
or
crashes, even
|
这个时候,提高innodb_force_recovery 参数值,也还是可能出现下面的情况:
1
2
|
mysql> insert into test2 select
*
from
test;
ERROR
2013
(HY000): Lost connection to MySQL server during query
|
对于上面这个语句,我们可能认为 当扫描table时第一次遇到corrupted的row时,会报错,并把已扫描的row存储在MyISAM(非事务型存储引擎)中,但实践是test2最终是没有记录的,原因是:there is some buffering taking place and as MySQL crashes it does not store all data it could recover to MyISAM table;(对这个理解还不是很懂)
对于这种情况,只能加上limit,一部分一部分的导入:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
mysql> insert ignore into test2 select
*
from
test limit
10
;
Query OK,
10
rows affected (
0.00
sec)
Records:
10
Duplicates:
0
Warnings:
0
mysql> insert ignore into test2 select
*
from
test limit
20
;
Query OK,
10
rows affected (
0.00
sec)
Records:
20
Duplicates:
10
Warnings:
0
mysql> insert ignore into test2 select
*
from
test limit
100
;
Query OK,
80
rows affected (
0.00
sec)
Records:
100
Duplicates:
20
Warnings:
0
mysql> insert ignore into test2 select
*
from
test limit
200
;
Query OK,
100
rows affected (
1.47
sec)
Records:
200
Duplicates:
100
Warnings:
0
mysql> insert ignore into test2 select
*
from
test limit
300
;
ERROR
2013
(HY000): Lost connection to MySQL server during query
|
以上我们可以得出,损坏的row就在200行-300行之间,下面我们可以使用二分查找,在次缩小查找范围,
1
2
3
4
5
6
7
8
9
10
11
12
|
mysql> select
max
(
id
)
from
test2;
+
-
-
-
-
-
-
-
-
-
+
|
max
(
id
) |
+
-
-
-
-
-
-
-
-
-
+
|
220
|
+
-
-
-
-
-
-
-
-
-
+
1
row
in
set
(
0.00
sec)
mysql> insert ignore into test2 select
*
from
test where
id
>
250
;
ERROR
2013
(HY000): Lost connection to MySQL server during query
mysql> insert ignore into test2 select
*
from
test where
id
>
300
;
Query OK,
573140
rows affected (
7.79
sec)
Records:
573140
Duplicates:
0
Warnings:
0
|
当跳过80行的时候,才insert成功;如果是 directory page 损坏,我们可能需要跳过更多page,如果是损坏的 higher level btree,可能会跳过更多,,,
在一些个别场景中像Data dictionary 或者root page 损坏,上面的方式可能不会奏效,此时可以尝试一把 innodb recovery toolkit 。。。
本文转自 位鹏飞 51CTO博客,原文链接:http://blog.51cto.com/weipengfei/1565194,如需转载请自行联系原作者