WSREP has not yet prepared node for application use

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS MySQL Serverless 高可用系列,价值2615元额度,1个月
日志服务 SLS,月写入数据量 50GB 1个月
简介:

最近PXC 5.7出现脑裂,前端Navicate连接到MySQL时,提示WSREP has not yet prepared node for application use错误。脑裂可是个不容易忽视的问题啊,严重的导致数据错乱。你懂的,就好比酒驾,迷糊啊,都乱了。下文是本次故障的具体描述。

一、环境说明

当前的集群采用了2个数据节点+1个仲裁节点,两节点均有写入数据
重启了仲裁节点后,数据节点发生脑裂
操作系统CentOS 7
数据库版本Percona XtraDB Cluster 5.7.20-18-57-log

故障现象
Navicate客户端可以连接,但是查询数据报错:WSREP has not yet prepared node for application use
2节点状态查看都显示为non-Primary

mysql> show status like '%wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 6                                    |
| wsrep_cluster_size      | 3                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | non-Primary                              |
+--------------------------+--------------------------------------+

二、故障分析

节点1日志

2018-05-29T11:36:56.595913+08:00 0 [Note] WSREP: (45125952, 'tcp://0.0.0.0:4567') connection to peer 17b576f8 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:56.602911+08:00 0 [Note] WSREP: (45125952, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.253:4567 
2018-05-29T11:37:05.600073+08:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,029e6b40,17) ##此处已提示为非主节点
memb {
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    029e6b40,0
    17b576f8,0
    }
)
2018-05-29T11:37:05.631952+08:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,45125952,18)
memb {
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    029e6b40,0
    17b576f8,0
    }
)
2018-05-29T11:37:05.634073+08:00 0 [Note] WSREP: declaring 029e6b40 at tcp://192.168.1.249:4567 stable
2018-05-29T11:37:05.634674+08:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,029e6b40,19)
memb {
    029e6b40,0
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    17b576f8,0
    }
)
2018-05-29T11:37:05.663943+08:00 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-05-29T11:37:05.670701+08:00 151042 [Warning] WSREP: Send action {(nil), 563, TORDERED} returned -107 (Transport endpoint is not connected)
2018-05-29T11:37:05.670763+08:00 151042 [Note] WSREP: --------- CONFLICT DETECTED --------  ###检查到冲突
2018-05-29T11:37:05.670778+08:00 151042 [Note] WSREP: cluster conflict due to certification failure for threads:

节点2日志

2018-05-28T23:59:07.757523+08:00 0 [Note] WSREP: Member 2.0 (pxc-cluster-node-1) synced with group.
2018-05-29T09:47:36.679991+08:00 6095593 [Note] Aborted connection 6095593 to db: 'DB71' user: 'user71' host: '192.168.1.253' (Got an error reading communication packets)
2018-05-29T09:47:36.680062+08:00 6095607 [Note] Aborted connection 6095607 to db: 'DB71' user: 'user71' host: '192.168.1.253' (Got an error reading communication packets)
            .............................................#Author : Leshami Blog : http://blog.csdn.net/leshami
2018-05-29T11:34:11.981171+08:00 6104858 [Note] Aborted connection 6104858 to db: 'DB71' user: 'user71' host: '192.168.1.253' (Got an error reading communication packets)
2018-05-29T11:36:02.203516+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') connection to peer 17b576f8 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:02.203708+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.253:4567 
2018-05-29T11:36:03.203906+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') reconnecting to 17b576f8 (tcp://192.168.1.253:4567), attempt 0
2018-05-29T11:36:03.704413+08:00 0 [Note] WSREP: declaring node with index 1 suspected, timeout PT5S (evs.suspect_timeout)
2018-05-29T11:36:03.704482+08:00 0 [Note] WSREP: evs::proto(029e6b40, OPERATIONAL, view_id(REG,029e6b40,17)) suspecting node: 17b576f8
2018-05-29T11:36:03.704507+08:00 0 [Note] WSREP: evs::proto(029e6b40, OPERATIONAL, view_id(REG,029e6b40,17)) suspected node without join message, declaring inactive
2018-05-29T11:36:04.204630+08:00 0 [Note] WSREP: declaring node with index 1 inactive (evs.inactive_timeout) 
2018-05-29T11:36:06.204943+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:09.206069+08:00 0 [Note] WSREP: declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)
2018-05-29T11:36:09.206168+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:09.706352+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:10.206318+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:10.206514+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:10.706763+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:11.204659+08:00 0 [Warning] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) install timer expired
evs::proto(evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)), GATHER) {
current_view=Current view of cluster as seen by this node
view (view_id(REG,029e6b40,17)
memb {
    029e6b40,0
    17b576f8,0
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    }
),
input_map=evs::input_map: {aru_seq=5278704,safe_seq=5278704,node_index=node: {idx=0,range=[5278710,5278709],safe_seq=5278704} node: {idx=1,range=[5278705,5278704],safe_seq=5278704} node: {idx=2,range=[5278710,5278709],safe_seq=5278704} },
fifo_seq=10557624,
last_sent=5278709,
known:

三、故障解决

对于当前的脑裂,由于在测试环境,因此直接停止其中一个节点(1.248),然后将剩余的节点强制提升为主节点
首先停止仲裁节点,然后停止1.249节点
如下在1.249节点上做如下操作

mysql> SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';
Query OK, 0 rows affected (0.00 sec)

--再次查看此时wsrep_cluster_status为Primary
mysql> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 4                                    |
| wsrep_cluster_size      | 1                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.01 sec)

在第一个节点248上启动
[root@node248 ~]# systemctl start mysql

--集群状态查看,此时也已正常
mysql> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 5                                    |
| wsrep_cluster_size      | 2                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | Primary                              |
+--------------------------+--------------------------------------+

在第三个节点启动仲裁节点
[root@ydq4 ~]# /etc/init.d/garb start
正在启动 /usr/bin/garbd:                                  [确定]

mysql> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 6                                    |
| wsrep_cluster_size      | 3                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | Primary                              |
+--------------------------+--------------------------------------+

四、后记

1、此次脑裂时应用程序端已出现节点无法连接,并且有日志记录到mysql error log
2、在数据节点已出现异常时,冒然重启仲裁节点不可取
3、建议使用3个数据节点最佳
4、多节点数据写入,最好按模块将写划分到不同节点
5、节点修复参考:http://galeracluster.com/documentation-webpages/twonode.html

上传日志:

相关实践学习
每个IT人都想学的“Web应用上云经典架构”实战
本实验从Web应用上云这个最基本的、最普遍的需求出发,帮助IT从业者们通过“阿里云Web应用上云解决方案”,了解一个企业级Web应用上云的常见架构,了解如何构建一个高可用、可扩展的企业级应用架构。
MySQL数据库入门学习
本课程通过最流行的开源数据库MySQL带你了解数据库的世界。   相关的阿里云产品:云数据库RDS MySQL 版 阿里云关系型数据库RDS(Relational Database Service)是一种稳定可靠、可弹性伸缩的在线数据库服务,提供容灾、备份、恢复、迁移等方面的全套解决方案,彻底解决数据库运维的烦恼。 了解产品详情: https://www.aliyun.com/product/rds/mysql 
目录
相关文章
Electron——node_modules\ffi-napi\build\Release\ffi_bindings.node is not a valid Win32 application.
Electron——node_modules\ffi-napi\build\Release\ffi_bindings.node is not a valid Win32 application.
174 0
|
数据库连接
(node:2612) DeprecationWarning: collection.ensureIndex is deprecated. Use createIndexes instead.
(node:2612) DeprecationWarning: collection.ensureIndex is deprecated. Use createIndexes instead.
When to use next() and return next() in Node.js
Some people always write return next() is to ensure that the execution stops after triggering the callback.
1066 1
|
JavaScript
use apt update nodejs on Ubuntu
Use n module from npm in order to upgrade node sudo npm cache clean -f sudo npm install -g n sudo n stable sudo ln -sf /usr/local/n/version...
1018 0
|
2月前
|
JavaScript Unix Linux
nvm与node.js的安装指南
通过以上步骤,你可以在各种操作系统上成功安装NVM和Node.js,从而在不同的项目中灵活切换Node.js版本。这种灵活性对于管理不同项目的环境依赖而言是非常重要的。
782 11
|
7月前
|
弹性计算 JavaScript 前端开发
一键安装!阿里云新功能部署Nodejs环境到ECS竟然如此简单!
Node.js 是一种高效的 JavaScript 运行环境,基于 Chrome V8 引擎,支持在服务器端运行 JavaScript 代码。本文介绍如何在阿里云上一键部署 Node.js 环境,无需繁琐配置,轻松上手。前提条件包括 ECS 实例运行中且操作系统为 CentOS、Ubuntu 等。功能特点为一键安装和稳定性好,支持常用 LTS 版本。安装步骤简单:登录阿里云控制台,选择扩展程序管理页面,安装 Node.js 扩展,选择实例和版本,等待创建完成并验证安装成功。通过阿里云的公共扩展,初学者和经验丰富的开发者都能快速进入开发状态,开启高效开发之旅。
|
6月前
|
资源调度 JavaScript 前端开发
前端开发必备!Node.js 18.x LTS保姆级安装教程(附国内镜像源配置)
本文详细介绍了Node.js的安装与配置流程,涵盖环境准备、版本选择(推荐LTS版v18.x)、安装步骤(路径设置、组件选择)、环境验证(命令测试、镜像加速)及常见问题解决方法。同时推荐开发工具链,如VS Code、Yarn等,并提供常用全局包安装指南,帮助开发者快速搭建高效稳定的JavaScript开发环境。内容基于官方正版软件,确保合规性与安全性。
5784 24