PXC 5.7 WSREP_SST: [ERROR] xtrabackup_checkpoints missing

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS PostgreSQL,集群系列 2核4GB
简介:

前阵子在配置好了PXC5.7之后,在启动其中的一个节点,碰到了 [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR。关于这个错误,需要从其它节点来获取更详细的日志描述。下文是对这个问题的描述及解决,供大家参考。

一、故障现象

PXC mysql 5.7某节点重启失败,下面是重启失败后的状态
当前环境
[root@node146 ~]# more /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@node146 ~]# mysql -V
mysql Ver 14.14 Distrib 5.7.19-17, for Linux (x86_64) using 6.2

[root@node146 ~]# systemctl status mysql.service
mysql.service - Percona XtraDB Cluster
Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-12-29 09:12:13 CST; 44s ago
Process: 19192 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 19160 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=2)
Process: 18153 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
Process: 18151 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=1/FAILURE)
Process: 18032 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 18151 (code=exited, status=1/FAILURE)
--下面开始输出错误信息
Dec 29 09:12:13 node146.example.com mysql-systemd[18153]: ERROR! mysqld_safe with PID 18151 has already exited: FAILURE
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=1
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: ERROR! mysql already dead
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=2
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql may be already dead
Dec 29 09:12:13 node146.example.com systemd[1]: Failed to start Percona XtraDB Cluster.
Dec 29 09:12:13 node146.example.com systemd[1]: Unit mysql.service entered failed state.
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service failed.

Dec 29 09:11:29 node146.example.com polkitd[18024]: Started polkitd version 0.112
Dec 29 09:11:29 node146.example.com polkitd[18024]: Loading rules from directory /etc/polkit-1/rules.d
Dec 29 09:11:29 node146.example.com polkitd[18024]: Loading rules from directory /usr/share/polkit-1/rules.d
Dec 29 09:11:29 node146.example.com polkitd[18024]: Finished loading, compiling and executing 6 rules
Dec 29 09:11:29 node146.example.com dbus[889]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
Dec 29 09:11:29 node146.example.com dbus-daemon[889]: dbus[889]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
Dec 29 09:11:29 node146.example.com systemd[1]: Started Authorization Manager.
-- Subject: Unit polkit.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit polkit.service has finished starting up.
--
-- The start-up result is done.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
Dec 29 09:11:29 node146.example.com gnome-session[2869]: PolicyKit daemon reconnected to bus.
Dec 29 09:11:29 node146.example.com gnome-session[2869]: Attempting to re-register as an authentication agent.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Registered Authentication Agent for unix-session:c1 (system bus name :1.30 [gnome-shell --mode=gdm], object path /org/freedesktop/PolicyKit1/Authentication
Dec 29 09:11:29 node146.example.com gnome-session[2869]: We are now a registered authentication agent.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Registered Authentication Agent for unix-process:18020:4562323 (system bus name :1.288 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/fre
Dec 29 09:11:29 node146.example.com systemd[1]: Starting Percona XtraDB Cluster...
-- Subject: Unit mysql.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mysql.service has begun starting up.
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.538125Z mysqld_safe Logging to '/var/log/mysqld.log'.
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.540867Z mysqld_safe Logging to '/var/log/mysqld.log'.
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.565416Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.569662Z mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.570611Z mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
Dec 29 09:11:43 node146.example.com mysql-systemd[18153]: State transfer in progress, setting sleep higher
Dec 29 09:12:04 node146.example.com systemd[1]: mysql.service: main process exited, code=exited, status=1/FAILURE
Dec 29 09:12:13 node146.example.com mysql-systemd[18153]: /usr/bin/mysql-systemd: line 140: kill: (18151) - No such process
Dec 29 09:12:13 node146.example.com mysql-systemd[18153]: ERROR! mysqld_safe with PID 18151 has already exited: FAILURE
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=1
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: ERROR! mysql already dead
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=2
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql may be already dead
Dec 29 09:12:13 node146.example.com systemd[1]: Failed to start Percona XtraDB Cluster.
-- Subject: Unit mysql.service has failed ## Author : Leshami
-- Defined-By: systemd                           ## Blog : http://blog.csdn.net/leshami
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mysql.service has failed.
--
-- The result is failed.
Dec 29 09:12:13 node146.example.com systemd[1]: Unit mysql.service entered failed state.
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service failed.
Dec 29 09:12:13 node146.example.com polkitd[18024]: Unregistered Authentication Agent for unix-process:18020:4562323 (system bus name :1.288, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, loca
lines 2162-2213/2213 (END)

下面查看mysqld的错误日志
[root@node146 ~]# tail -fn 100 /var/log/mysqld.log
2017-12-29T01:11:43.416689Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> aeb87793-ebb2-11e7-b33e-eeaf4988bbe4:0
2017-12-29T01:11:43.812807Z WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing
2017-12-29T01:11:43.841926Z WSREP_SST: [INFO] Proceeding with SST.........
2017-12-29T01:11:43.867695Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2017-12-29T01:11:44.359210Z 0 [Note] WSREP: (3a4aef7c, 'tcp://0.0.0.0:4567') connection to peer 3a4aef7c with addr tcp://192.168.81.146:4567 timed out, no messages seen in PT3S
2017-12-29T01:11:44.359666Z 0 [Note] WSREP: (3a4aef7c, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-12-29T01:11:53.926533Z WSREP_SST: [ERROR] ******************* FATAL ERROR **********************  这里提示致命的错误,跟xtrabackup有关,提示检查 DONOR log
2017-12-29T01:11:53.927524Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
2017-12-29T01:11:53.928649Z WSREP_SST: [ERROR] ******************************************************
2017-12-29T01:11:53.932079Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2017-12-29T01:11:53.942644Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.81.146' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '18614' '' : 2 (No such file or directory)
2017-12-29T01:11:53.942695Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2017-12-29T01:11:53.942703Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2017-12-29T01:11:53.942729Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2017-12-29T01:11:53.942735Z 0 [ERROR] Aborting

ssh到另外一个节点1.142上,在mysql数据目录下,查看innobackup日志
[root@node142 ~]# more /var/lib/mysql/innobackup.backup.log
171229 09:11:47 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES).
Failed to connect to MySQL server: DBI connect(';mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at - line 1314. --提示访问被拒绝,晕,账户明明已经被创建过了,不知道是不是误删了或者账户被修改。
171229 09:11:47 Connecting to MySQL server host: localhost, user: sstuser, password: set, port: not set, socket: /var/lib/mysql/mysql.sock
Failed to connect to MySQL server: Access denied for user 'sstuser'@'localhost' (using password: YES).

二、故障解决

从上面的错误可知,用于SST同步的用户访问被拒绝,测试SST用户能否登陆,如下,确实无法登陆
[root@node142 mysql]# mysql -usstuser -ps3cretPass
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'sstuser'@'localhost' (using password: YES)

--下面创建这个
[root@node142 mysql]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 26
Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> alter user 'sstuser'@'localhost' identified by 's3cretPass';  --为账户修改密码
Query OK, 0 rows affected (0.05 sec)

mysql> exit
Bye
[root@node142 mysql]# mysql -usstuser -ps3cretPass --再次登陆成功
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27
Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22

Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> exit
Bye
接下来重启1.146 IP节点成功。

三、更多参考

基于CentOS 7 安装Percona XtraDB Cluster(PXC) 5.7
MySQL 5.7 时间显示修改(log_timestamps UTC)
MySQL PXC 5.7 invalid user‘@MYSQLD_USER@’

DBA牛鹏社(SQL/NOSQL/LINUX)

相关实践学习
如何在云端创建MySQL数据库
开始实验后,系统会自动创建一台自建MySQL的 源数据库 ECS 实例和一台 目标数据库 RDS。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
关系型数据库 MySQL
mysqlbinlog: File ‘mysql-bin.000001;‘ not found (Errcode: 2 - No such file or directory)
mysqlbinlog: File ‘mysql-bin.000001;‘ not found (Errcode: 2 - No such file or directory)
441 0
|
4月前
|
TensorFlow 算法框架/工具
【ERROR】FailedPreconditionError: checkpoint is not a directory
【ERROR】FailedPreconditionError: checkpoint is not a directory
59 1
|
存储 关系型数据库 MySQL
mysql 数据库无法启动(Ignoring the redo log due to missing MLOG_CHECKPOINT between the checkpoint .... and)
数据库机器的CPU和主板都换了,重新开机,发现mysql数据库无法启动!
294 0
|
关系型数据库 数据库 PostgreSQL
PG异常无法启动的问题:could not read file "pg_logical/replorigin_checkpoint": Success
问题描述 新安装不久的PostgreSQL数据库,断电后重启,查看日志如下 2019-01-08 08:44:19.989 UTC [7493] LOG: database system was interrupted; last known up at 2018-12-24 10:56:28 UTC 2019-01-08 08:44:19.
3370 0
|
关系型数据库 MySQL
mysqld: File './binlog.~rec~' not found 问题解决处理
之前搭建了mysql主从同步服务器,之前启动正常,但今天在重启了服务器后,发现一直报 mysql pid mysqld_safe mysqld from pid file  hostname.pid ended 这实际是个错误,说明mysql启动时找*.pid文件发生错误。
2738 0