一个PXC 8.0.23集群,因为项目操作导致无法提供服务了,提示信息为:
ERROR 1047 (08S01): WSREP has not yet prepared node for application use
2013 - Lost connection to MySQL server during query
登录各个节点查看集群wsrep_cluster_size均为0,节点状态wsrep_cluster_status都不是Primary状态(好像是not connected),查看grastate.dat文件,3号节点safe_to_bootstrap为1.
2022-01-12T11:12:43.552286Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2022-01-12T11:20:32.979860Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (16448) with SIGKILL after stalling for 120 seconds
2022-01-12T11:20:33.010860Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: 行 183: 16450 已杀死 socat -u openssl-listen:4444,reuseaddr,cert=/mysql/pxc/data//server-cert.pem,key=/mysql/pxc/data//server-key.pem,cafile=/mysql/pxc/data//ca.pem,verify=1,retry=30 stdio
2022-01-12T11:20:33.010931Z 0 [Note] [MY-000000] [WSREP-SST] 16451 | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2022-01-12T11:20:33.011525Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2022-01-12T11:20:33.011676Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node: exit codes: 137 137
2022-01-12T11:20:33.011756Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1268
2022-01-12T11:20:33.011874Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2022-01-12T11:20:33.012861Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2022-01-12T11:20:33.210760Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/mysql/pxc/data/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '15908' --mysqld-version '8.0.23-14.1' '' : 32 (Broken pipe)
2022-01-12T11:20:33.210898Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2022-01-12T11:20:33.210973Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2022-01-12T11:20:33.211182Z 3 [Note] [MY-000000] [Galera] Processing SST received
2022-01-12T11:20:33.211268Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2022-01-12T11:20:33.211352Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
AI 代码解读
网搜的文章五花八门,参考过几个文章,均没用。因为看到错误日志信息--address '',一度怀疑配置参数wsrep_node_address是否需要显式指定,因为都是默认注释掉的,显式指定后仍然报错如下:
2022-01-13T08:03:32.978322Z 0 [Note] [MY-000000] [WSREP-SST] Proceeding with SST.........
2022-01-13T08:03:33.036563Z 0 [Note] [MY-000000] [WSREP-SST] ............Waiting for SST streaming to complete!
2022-01-13T08:12:38.715388Z 0 [Note] [MY-000000] [Galera] Created page /mysql/pxc/data/gcache.page.000000 of size 592621440 bytes
2022-01-13T08:12:51.193262Z 0 [ERROR] [MY-000000] [WSREP-SST] Killing SST (27632) with SIGKILL after stalling for 120 seconds
2022-01-13T08:12:51.217686Z 0 [Note] [MY-000000] [WSREP-SST] /usr/bin/wsrep_sst_xtrabackup-v2: line 183: 27634 killed socat -u openssl-listen:4444,reuseaddr,cert=/mysql/pxc/data//server-cert.pem,key=/mysql/pxc/data//server-key.pem,cafile=/mysql/pxc/data//ca.pem,verify=1,retry=30 stdio
2022-01-13T08:12:51.217754Z 0 [Note] [MY-000000] [WSREP-SST] 27635 | /usr/bin/pxc_extra/pxb-8.0/bin/xbstream -x
2022-01-13T08:12:51.218372Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************* FATAL ERROR **********************
2022-01-13T08:12:51.218550Z 0 [ERROR] [MY-000000] [WSREP-SST] Error while getting data from donor node: exit codes: 137 137
2022-01-13T08:12:51.218628Z 0 [ERROR] [MY-000000] [WSREP-SST] Line 1268
2022-01-13T08:12:51.218722Z 0 [ERROR] [MY-000000] [WSREP-SST] ******************************************************
2022-01-13T08:12:51.219631Z 0 [ERROR] [MY-000000] [WSREP-SST] Cleanup after exit with status:32
2022-01-13T08:12:51.431617Z 0 [ERROR] [MY-000000] [WSREP] Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '' --datadir '/mysql/pxc/data/' --basedir '/usr/' --plugindir '/usr/lib64/mysql/plugin/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '27097' --mysqld-version '8.0.23-14.1' '' : 32 (Broken pipe)
2022-01-13T08:12:51.431820Z 0 [ERROR] [MY-000000] [WSREP] Failed to read uuid:seqno from joiner script.
2022-01-13T08:12:51.431892Z 0 [ERROR] [MY-000000] [WSREP] SST script aborted with error 32 (Broken pipe)
2022-01-13T08:12:51.432257Z 3 [Note] [MY-000000] [Galera] Processing SST received
2022-01-13T08:12:51.432372Z 3 [Note] [MY-000000] [Galera] SST request was cancelled
2022-01-13T08:12:51.432458Z 3 [ERROR] [MY-000000] [Galera] State transfer request failed unrecoverably: 32 (Broken pipe). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
AI 代码解读
【matthewb Percona】
Your log indicates that port 4444 is not open TCP/UDP to all hosts. Make sure all necessary ports (3306, 4444, 4567, 4568) are open between all nodes.
Thanks for your reply, but I am sure I have closed firewall between all nodes. Maybe there is some other issues?
【Evgeniy_Patlan Percona】
"while getting data from donor node: exit codes: 137 137"
Such issue appeared once it is not possible to connect to the needed port. So please recheck your firewall options
【matthewb Percona】
"I am sure I have closed firewall between all nodes"
That’s your problem. You need to OPEN the firewall between nodes, not close it. Use socat or nc to test connectivity between nodes on the ports I mentioned.
Many thanks to you all, I will do this according to your suggest
AI 代码解读
It is ok now.
According to your suggest, I modified the netfilter rules on all nodes like this:
- Accept all input
- Clear all netfilter rules
Now the cluster works fine.
[root@db-1 ~]# iptables -P INPUT ACCEPT
[root@db-1 ~]# iptables -F
[root@db-1 ~]# iptables -X
[root@db-1 ~]# iptables -Z
[root@db-1 ~]# iptables -A INPUT -i lo -j ACCEPT
[root@db-1 ~]# iptables-save
#Generated by iptables-save v1.4.21 on Mon Jan 24 11:33:23 2022
:INPUT ACCEPT [884:105489]
:OUTPUT ACCEPT [685:162312]
-A INPUT -i lo -j ACCEPT
#Completed on Mon Jan 24 11:33:23 2022
AI 代码解读