在第一小节中我们讲到了,MySQL Cluster 集群环境下的 SQL 节点负责接收和解析应用端发来的 SQL 命令,然后再将解析后的命令传送至数据节点进行数据的过滤和查询,最后再由 SQL 节点将需要的数据反馈至应用端。
那么,如果当某一个SQL节点出现故障,例如 mysqld 进程意外中止,或者服务器出现了宕机,这时MySQL Cluster 集群是否仍可以提供应用端的正常读写呢?这一小节,我们就来对“某个 SQL 节点出现单点故障是否会对 MySQL Cluster 整体的读写造成影响”这个问题进行简单的测试。
我们先检测一下两个SQL节点(mysql04、mysql05)的运行状态。在管理节点上查看每个节点的运行状态,看到每个节点的状态均是正常的。
ndb_mgm> show; Cluster Configuration [ndbd(NDB)] 2 node(s) id=2 @192.168.1.6 (mysql-5.7.36 ndb-7.6.20, Nodegroup: 0, *) id=3 @192.168.1.7 (mysql-5.7.36 ndb-7.6.20, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.1.3 (mysql-5.7.36 ndb-7.6.20) [mysqld(API)] 2 node(s) id=4 @192.168.1.4 (mysql-5.7.36 ndb-7.6.20) id=5 @192.168.1.5 (mysql-5.7.36 ndb-7.6.20)
然后进行一个简单的测试。
mysql04 节点:
[mysql@mysql04 ~]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.7.36-ndb-7.6.20-cluster-gpl MySQL Cluster Community Server (GPL) Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use testdb; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select * from t2; +------+------+ | id | name | +------+------+ | 1 | a | | 2 | bbb | +------+------+ 2 rows in set (0.00 sec)
mysql05 节点:
[mysql@mysql05 ~]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 6 Server version: 5.7.36-ndb-7.6.20-cluster-gpl MySQL Cluster Community Server (GPL) Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use testdb; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select * from t2; +------+------+ | id | name | +------+------+ | 1 | a | | 2 | bbb | +------+------+ 2 rows in set (0.00 sec)
可以看到,两个 SQL 节点均运行正常,这时我们对 mysql04 节点上的 mysqld 进行 kill 操作,来进行故障的模拟。
[mysql@mysql04 ~]$ ps -ef | grep mysqld mysql 5759 5738 0 09:05 pts/2 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf mysql 5955 5759 3 09:05 pts/2 00:02:57 /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql --datadir=/mysql/mydata --plugin-dir=/usr/local/mysql/lib/plugin --log-error=mysql04.err --pid-file=mysql04.pid --socket=/mysql/mydata/mysql.sock --port=3306 mysql 6135 5738 0 10:27 pts/2 00:00:00 grep --color=auto mysqld [mysql@mysql04 ~]$ kill -9 5759 5955 [mysql@mysql04 ~]$ [1]+ Killed /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf [mysql@mysql04 ~]$ [mysql@mysql04 ~]$ ps -ef | grep mysqld mysql 6137 5738 0 10:27 pts/2 00:00:00 grep --color=auto mysqld
可以看到,当前在 mysql04 节点上已经不存在 mysqld 进程了,同时通过 mysql04 节点已经无法连接到 MySQL Cluster 集群环境中。
[mysql@mysql04 ~]$ mysql ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/mysql/mydata/mysql.sock' (111)
接下来,我们在管理节点上确认一下 mysql04 节点的运行状态。
ndb_mgm> show; Cluster Configuration [ndbd(NDB)] 2 node(s) id=2 @192.168.1.6 (mysql-5.7.36 ndb-7.6.20, Nodegroup: 0, *) id=3 @192.168.1.7 (mysql-5.7.36 ndb-7.6.20, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.1.3 (mysql-5.7.36 ndb-7.6.20) [mysqld(API)] 2 node(s) id=4 (not connected, accepting connect from 192.168.1.4) id=5 @192.168.1.5 (mysql-5.7.36 ndb-7.6.20)
可以看到,当前 mysql04 节点处于 not connected 状态,即 mysql04 是与 MySQL Cluster 集群环境处于连接中断的状态。
然后,我们在 mysql05 节点上测试是否可以连接到 MySQL Cluster 集群环境,同时是否可以在 mysql05 节点上进行数据的读写操作。如下所示:
[mysql@mysql05 ~]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 5.7.36-ndb-7.6.20-cluster-gpl MySQL Cluster Community Server (GPL) Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use testdb; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select * from t2; +------+------+ | id | name | +------+------+ | 1 | a | | 2 | bbb | +------+------+ 2 rows in set (0.00 sec) mysql> update t2 set name='aaaaa' where id=1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from t2; +------+-------+ | id | name | +------+-------+ | 1 | aaaaa | | 2 | bbb | +------+-------+ 2 rows in set (0.00 sec)
我们发现,通过 mysql05 节点可以正常连接到 MySQL Cluster 集群环境中,同时也可以对数据进行正常读写,即mysql04节点的故障并不影响mysql05节点的运行。
那么此时,我们修复 mysql04 节点上的故障,启动 mysql04 节点上的 mysqld 进程。
[mysql@mysql04 ~]$ /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf & [1] 6140 [mysql@mysql04 ~]$ 2022-03-10T02:35:08.658547Z mysqld_safe Logging to '/mysql/mydata/mysql04.err'. 2022-03-10T02:35:08.683992Z mysqld_safe Starting mysqld daemon with databases from /mysql/mydata
之后,在管理节点上查看其状态。
ndb_mgm> show; Cluster Configuration [ndbd(NDB)] 2 node(s) id=2 @192.168.1.6 (mysql-5.7.36 ndb-7.6.20, Nodegroup: 0, *) id=3 @192.168.1.7 (mysql-5.7.36 ndb-7.6.20, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.1.3 (mysql-5.7.36 ndb-7.6.20) [mysqld(API)] 2 node(s) id=4 @192.168.1.4 (mysql-5.7.36 ndb-7.6.20) id=5 @192.168.1.5 (mysql-5.7.36 ndb-7.6.20)
这个时候,mysql04 节点上的服务也恢复正常了,然后通过 mysql04 连接到 MySQL Cluster 集群环境,测试是否可以读取到最新的数据信息,即读取到 t2 表中 id=1 的 name 字段是否为 aaaaa。如下所示:
[mysql@mysql04 ~]$ mysql Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.36-ndb-7.6.20-cluster-gpl MySQL Cluster Community Server (GPL) Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use testdb; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select * from t2; +------+-------+ | id | name | +------+-------+ | 1 | aaaaa | | 2 | bbb | +------+-------+ 2 rows in set (0.00 sec)
可以看到,在对 mysql04 节点故障恢复后,仍然是可以读取到最新数据的。即单个SQL节点的故障其实并不会影响到整个MySQL Cluster集群环境的可用性。
当某个 SQL 节点出现故障后,只需要把应用端的连接指向可用的 SQL 节点即可。这也就是为什么在工作中的 MySQL Cluster 环境往往需要在数据库的前端配备一个负载均衡器,当某一个 SQL 节点出现故障时,应用端的请求在通过负载均衡器时,自动会分发到可用的 SQL 节点上,从而保证 SQL 节点的故障对于应用程序来说是零影响的。
好了,关于 SQL 节点故障的测试我们就讲到这里。在下一小节中,我们来讲解在 MySQL Cluster 集群环境中如何关闭各个节点服务。