前阵子在配置好了PXC5.7之后,在启动其中的一个节点,碰到了 [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR。关于这个错误,需要从其它节点来获取更详细的日志描述。下文是对这个问题的描述及解决,供大家参考。
一、故障现象
PXC mysql 5.7某节点重启失败,下面是重启失败后的状态
当前环境
[root@node146 ~]# more /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@node146 ~]# mysql -V
mysql Ver 14.14 Distrib 5.7.19-17, for Linux (x86_64) using 6.2
[root@node146 ~]# systemctl status mysql.service
mysql.service - Percona XtraDB Cluster
Loaded: loaded (/usr/lib/systemd/system/mysql.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2017-12-29 09:12:13 CST; 44s ago
Process: 19192 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 19160 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=2)
Process: 18153 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
Process: 18151 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=1/FAILURE)
Process: 18032 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 18151 (code=exited, status=1/FAILURE)
--下面开始输出错误信息
Dec 29 09:12:13 node146.example.com mysql-systemd[18153]: ERROR! mysqld_safe with PID 18151 has already exited: FAILURE
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=1
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: ERROR! mysql already dead
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=2
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql may be already dead
Dec 29 09:12:13 node146.example.com systemd[1]: Failed to start Percona XtraDB Cluster.
Dec 29 09:12:13 node146.example.com systemd[1]: Unit mysql.service entered failed state.
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service failed.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Started polkitd version 0.112
Dec 29 09:11:29 node146.example.com polkitd[18024]: Loading rules from directory /etc/polkit-1/rules.d
Dec 29 09:11:29 node146.example.com polkitd[18024]: Loading rules from directory /usr/share/polkit-1/rules.d
Dec 29 09:11:29 node146.example.com polkitd[18024]: Finished loading, compiling and executing 6 rules
Dec 29 09:11:29 node146.example.com dbus[889]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
Dec 29 09:11:29 node146.example.com dbus-daemon[889]: dbus[889]: [system] Successfully activated service 'org.freedesktop.PolicyKit1'
Dec 29 09:11:29 node146.example.com systemd[1]: Started Authorization Manager.
-- Subject: Unit polkit.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit polkit.service has finished starting up.
--
-- The start-up result is done.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
Dec 29 09:11:29 node146.example.com gnome-session[2869]: PolicyKit daemon reconnected to bus.
Dec 29 09:11:29 node146.example.com gnome-session[2869]: Attempting to re-register as an authentication agent.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Registered Authentication Agent for unix-session:c1 (system bus name :1.30 [gnome-shell --mode=gdm], object path /org/freedesktop/PolicyKit1/Authentication
Dec 29 09:11:29 node146.example.com gnome-session[2869]: We are now a registered authentication agent.
Dec 29 09:11:29 node146.example.com polkitd[18024]: Registered Authentication Agent for unix-process:18020:4562323 (system bus name :1.288 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/fre
Dec 29 09:11:29 node146.example.com systemd[1]: Starting Percona XtraDB Cluster...
-- Subject: Unit mysql.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mysql.service has begun starting up.
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.538125Z mysqld_safe Logging to '/var/log/mysqld.log'.
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.540867Z mysqld_safe Logging to '/var/log/mysqld.log'.
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.565416Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.569662Z mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
Dec 29 09:11:40 node146.example.com mysqld_safe[18151]: 2017-12-29T01:11:40.570611Z mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
Dec 29 09:11:43 node146.example.com mysql-systemd[18153]: State transfer in progress, setting sleep higher
Dec 29 09:12:04 node146.example.com systemd[1]: mysql.service: main process exited, code=exited, status=1/FAILURE
Dec 29 09:12:13 node146.example.com mysql-systemd[18153]: /usr/bin/mysql-systemd: line 140: kill: (18151) - No such process
Dec 29 09:12:13 node146.example.com mysql-systemd[18153]: ERROR! mysqld_safe with PID 18151 has already exited: FAILURE
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=1
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19160]: ERROR! mysql already dead
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service: control process exited, code=exited status=2
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Dec 29 09:12:13 node146.example.com mysql-systemd[19192]: WARNING: mysql may be already dead
Dec 29 09:12:13 node146.example.com systemd[1]: Failed to start Percona XtraDB Cluster.
-- Subject: Unit mysql.service has failed ## Author : Leshami
-- Defined-By: systemd ## Blog : http://blog.csdn.net/leshami
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mysql.service has failed.
--
-- The result is failed.
Dec 29 09:12:13 node146.example.com systemd[1]: Unit mysql.service entered failed state.
Dec 29 09:12:13 node146.example.com systemd[1]: mysql.service failed.
Dec 29 09:12:13 node146.example.com polkitd[18024]: Unregistered Authentication Agent for unix-process:18020:4562323 (system bus name :1.288, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, loca
lines 2162-2213/2213 (END)
下面查看mysqld的错误日志
[root@node146 ~]# tail -fn 100 /var/log/mysqld.log
2017-12-29T01:11:43.416689Z 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> aeb87793-ebb2-11e7-b33e-eeaf4988bbe4:0
2017-12-29T01:11:43.812807Z WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing
2017-12-29T01:11:43.841926Z WSREP_SST: [INFO] Proceeding with SST.........
2017-12-29T01:11:43.867695Z WSREP_SST: [INFO] ............Waiting for SST streaming to complete!
2017-12-29T01:11:44.359210Z 0 [Note] WSREP: (3a4aef7c, 'tcp://0.0.0.0:4567') connection to peer 3a4aef7c with addr tcp://192.168.81.146:4567 timed out, no messages seen in PT3S
2017-12-29T01:11:44.359666Z 0 [Note] WSREP: (3a4aef7c, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-12-29T01:11:53.926533Z WSREP_SST: [ERROR] ******************* FATAL ERROR ********************** 这里提示致命的错误,跟xtrabackup有关,提示检查 DONOR log
2017-12-29T01:11:53.927524Z WSREP_SST: [ERROR] xtrabackup_checkpoints missing. xtrabackup/SST failed on DONOR. Check DONOR log
2017-12-29T01:11:53.928649Z WSREP_SST: [ERROR] ******************************************************
2017-12-29T01:11:53.932079Z WSREP_SST: [ERROR] Cleanup after exit with status:2
2017-12-29T01:11:53.942644Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.81.146' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '18614' '' : 2 (No such file or directory)
2017-12-29T01:11:53.942695Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2017-12-29T01:11:53.942703Z 0 [ERROR] WSREP: SST script aborted with error 2 (No such file or directory)
2017-12-29T01:11:53.942729Z 0 [ERROR] WSREP: SST failed: 2 (No such file or directory)
2017-12-29T01:11:53.942735Z 0 [ERROR] Aborting
ssh到另外一个节点1.142上,在mysql数据目录下,查看innobackup日志
[root@node142 ~]# more /var/lib/mysql/innobackup.backup.log
171229 09:11:47 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES).
Failed to connect to MySQL server: DBI connect(';mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at - line 1314. --提示访问被拒绝,晕,账户明明已经被创建过了,不知道是不是误删了或者账户被修改。
171229 09:11:47 Connecting to MySQL server host: localhost, user: sstuser, password: set, port: not set, socket: /var/lib/mysql/mysql.sock
Failed to connect to MySQL server: Access denied for user 'sstuser'@'localhost' (using password: YES).
二、故障解决
从上面的错误可知,用于SST同步的用户访问被拒绝,测试SST用户能否登陆,如下,确实无法登陆
[root@node142 mysql]# mysql -usstuser -ps3cretPass
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'sstuser'@'localhost' (using password: YES)
--下面创建这个
[root@node142 mysql]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 26
Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> alter user 'sstuser'@'localhost' identified by 's3cretPass'; --为账户修改密码
Query OK, 0 rows affected (0.05 sec)
mysql> exit
Bye
[root@node142 mysql]# mysql -usstuser -ps3cretPass --再次登陆成功
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 27
Server version: 5.7.19-17-57 Percona XtraDB Cluster (GPL), Release rel17, Revision 35cdc81, WSREP version 29.22, wsrep_29.22
Copyright (c) 2009-2017 Percona LLC and/or its affiliates
Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> exit
Bye
接下来重启1.146 IP节点成功。
三、更多参考
基于CentOS 7 安装Percona XtraDB Cluster(PXC) 5.7
MySQL 5.7 时间显示修改(log_timestamps UTC)
MySQL PXC 5.7 invalid user‘@MYSQLD_USER@’