greenplum 节点故障修复文档

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介:

greenplum集群说明

greenplum整个集群是由多台服务器组合而成,任何一台服务都有可能发生软件或硬件故障,我们一起来模拟一下任何一个节点或服务器故障后,greenplumn的容错及恢复方法

master  服务器:l-test5
standby 服务器:l-test6
segment 服务器:(primary + mirror):l-test[7-12]
 
greenplum集群说明:1 master  +  1 standby + 24 primary segments + 24 mirror segments

正常集群状态

在master查看数据库当前的状态
[gpadmin@l-test5 ~]$ gpstate
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: 
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:23:43:004860 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
.. 
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-Greenplum instance status summary
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Master instance                                           = Active
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Master standby                                            = l-test6
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Standby master state                                      = Standby host passive
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total segment instance count from metadata                = 48
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Primary Segment Status
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total primary segments                                    = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total primary segment valid (at master)                   = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total primary segment failures (at master)                = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number postmaster processes found                   = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Mirror Segment Status
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total mirror segments                                     = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total mirror segment failures (at master)                 = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid files missing              = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs missing               = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of /tmp lock files missing                   = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number postmaster processes missing                 = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number postmaster processes found                   = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number mirror segments acting as primary segments   = 0
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 24
20180209:16:23:45:004860 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------

master 服务器故障

当master节点故障后,我们需要激活standby节点作为新的master节点(如果服务器允许,把vip也切换到standby服务器)

在激活standby节点的可以直接指定新的standby节点,也可以等原master服务器恢复后,指定原master节点为standby节点

这里我就不关服务器了,直接模拟master节点故障的状态
关闭master节点

[gpadmin@l-test5 ~]$ gpstop -a -m 
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Starting gpstop with args: -a -m
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Gathering information and validating the environment...
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-There are 0 connections to the database
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Master host=l-test5
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=smart
20180211:15:16:24:022773 gpstop:l-test5:gpadmin-[INFO]:-Master segment instance directory=/export/gp_data/master/gpseg-1
20180211:15:16:25:022773 gpstop:l-test5:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20180211:15:16:25:022773 gpstop:l-test5:gpadmin-[INFO]:-Terminating processes for segment /export/gp_data/master/gpseg-1

[gpadmin@l-test5 ~]$ psql
psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.65432"?

激活standby节点

[gpadmin@l-test6 ~]$ gpactivatestandby -a -d /export/gp_data/master/gpseg-1/
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:------------------------------------------------------
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Standby data directory    = /export/gp_data/master/gpseg-1
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Standby port              = 65432
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Standby running           = yes
20180211:15:19:48:016771 gpactivatestandby:l-test6:gpadmin-[INFO]:-Force standby activation  = no
.
.
.

切换服务器vip

原master节点服务器卸载vip:
[root@l-test5 ~]# 
[root@l-test5 ~]# ip a d 10.0.0.1/32 brd + dev bond0

-----------
原standby节点服务器挂载vip:

[root@l-test6 ~]# 
[root@l-test6 ~]# ip a a  10.0.0.1/32 brd + dev bond0 && arping -q -c 3 -U -I bond0 10.0.0.1

指定新的standby节点

我们指定原master节点为新的standby节点服务器

需要先删除原master的数据文件,然后重新执行初始化standby节点即可

[gpadmin@l-test6 ~]$ gpinitstandby -a -s l-test5
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[INFO]:-Checking for filespace directory /export/gp_data/master/gpseg-1 on l-test5
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[ERROR]:-Filespace directory already exists on host l-test5
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[ERROR]:-Failed to create standby
20180211:15:28:56:019106 gpinitstandby:l-test6:gpadmin-[ERROR]:-Error initializing standby master: master data directory exists
----------注意,这步是在原master节点操作-------
[gpadmin@l-test5 ~]$ cd /export/gp_data/master/
[gpadmin@l-test5 /export/gp_data/master]$ ll
total 4
drwx------ 17 gpadmin gpadmin 4096 Feb 11 15:17 gpseg-1
[gpadmin@l-test5 /export/gp_data/master]$ rm -rf gpseg-1/
[gpadmin@l-test5 /export/gp_data/master]$ ll
total 0
[gpadmin@l-test5 /export/gp_data/master]$ 
----------

[gpadmin@l-test6 ~]$ gpinitstandby -a -s l-test5
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Checking for filespace directory /export/gp_data/master/gpseg-1 on l-test5
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:------------------------------------------------------
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:------------------------------------------------------
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum master hostname               = l-test6
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum master data directory         = /export/gp_data/master/gpseg-1
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum master port                   = 65432
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master hostname       = l-test5
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master port           = 65432
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum standby master data directory = /export/gp_data/master/gpseg-1
20180211:15:30:55:019592 gpinitstandby:l-test6:gpadmin-[INFO]:-Greenplum update system catalog         = On
.
.

standby 服务器故障

当standby节点服务器恢复后,需要将standby节点删除,然后重新初始化一下standby服务器即可

在master服务器上查看整个greenplum集群状态,我们发现[WARNING]:-Standby master state

[gpadmin@l-test5 ~]$ gpstate
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: 
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:32:19:007208 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:-Greenplum instance status summary
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:-   Master instance                                           = Active
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[INFO]:-   Master standby                                            = l-test6
20180209:16:32:21:007208 gpstate:l-test5:gpadmin-[WARNING]:-Standby master state                                      = Standby host DOWN   
.
.
[gpadmin@l-test5 ~]$ gpstate -f
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -f
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-Standby master details
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-----------------------
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-   Standby address          = l-test6
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-   Standby data directory   = /export/gp_data/master/gpseg-1
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-   Standby port             = 65432
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[WARNING]:-Standby PID              = 10480                            Have lock file /tmp/.s.PGSQL.65432.lock but no process running on port 65432   <<<<<<<<
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[WARNING]:-Standby status           = Status could not be determined   <<<<<<<<
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--pg_stat_replication
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:-No entries found.
20180209:16:38:41:008728 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------

1、删除故障的standby节点

[gpadmin@l-test5 ~]$ gpinitstandby -r -a
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Warm master standby removal parameters
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master hostname               = l-test5
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master data directory         = /export/gp_data/master/gpseg-1
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master port                   = 65432
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master hostname       = l-test6
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master port           = 65432
20180209:16:52:04:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master data directory = /export/gp_data/master/gpseg-1
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Removing standby master from catalog...
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Database catalog updated successfully.
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Removing filespace directories on standby master...
20180209:16:52:18:011917 gpinitstandby:l-test5:gpadmin-[INFO]:-Successfully removed standby master

2、重新初始化standby节点
[gpadmin@l-test5 ~]$ gpinitstandby -s l-test6 -a
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Validating environment and parameters for standby initialization...
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Checking for filespace directory /export/gp_data/master/gpseg-1 on l-test6
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master initialization parameters
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:------------------------------------------------------
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master hostname               = l-test5
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master data directory         = /export/gp_data/master/gpseg-1
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum master port                   = 65432
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master hostname       = l-test6
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master port           = 65432
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum standby master data directory = /export/gp_data/master/gpseg-1
20180209:16:59:08:013723 gpinitstandby:l-test5:gpadmin-[INFO]:-Greenplum update system catalog         = On
.
.

3、检查standby 的配置信息

[gpadmin@l-test5 ~]$ gpstate -f
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -f
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-Standby master details
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-----------------------
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-   Standby address          = l-test6
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-   Standby data directory   = /export/gp_data/master/gpseg-1
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-   Standby port             = 65432
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-   Standby PID              = 19057
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:-   Standby status           = Standby host passive
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--pg_stat_replication
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--WAL Sender State: streaming
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Sync state: sync
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Sent Location: 0/14000000
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Flush Location: 0/14000000
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--Replay Location: 0/14000000
20180209:16:59:18:013939 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------

segment 节点故障

当一个primary segment节点故障,那么它所对应的mirror segment节点会接替primary的状态,继续保证整个集群的数据完整性

当一个mirror segment节点出现故障,它不会影响整个集群的可用性,但是需要尽快修复,保证所有的primary segment都有备份

如果primary segment 和 它所对应的mirror segment 节点都出现故障,那么greenplum认为集群数据不完整,整个集群将不再提供服务,直到primary segment 或 mirror segment恢复

修复segment节点,需要重启greenplum集群

primary segment节点和mirror segment节点的故障修复方式是一样的,这里以mirror节点故障为例
关闭一个节点后集群状态

[gpadmin@l-test7 ~]$ pg_ctl -D /export/gp_data/mirror/data4/gpseg23 stop -m fast
waiting for server to shut down.... done
server stopped

--------master节点执行-------
[gpadmin@l-test5 ~]$ gpstate 
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: 
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:16:17:12:005738 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
.
.
.
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Mirror Segment Status
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total mirror segments                                     = 24
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total mirror segment valid (at master)                    = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total mirror segment failures (at master)                 = 1                      <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number of postmaster.pid files missing              = 1                      <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid files found                = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number of postmaster.pid PIDs missing               = 1                      <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total number of postmaster.pid PIDs found                 = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number of /tmp lock files missing                   = 1                      <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total number of /tmp lock files found                     = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[WARNING]:-Total number postmaster processes missing                 = 1                      <<<<<<<<
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total number postmaster processes found                   = 23
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total number mirror segments acting as primary segments   = 0
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-   Total number mirror segments acting as mirror segments    = 24
20180211:16:17:14:005738 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------


[gpadmin@l-test5 ~]$ gpstate -m
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -m
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--Current GPDB mirror list and status
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--Type = Spread
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-   Mirror              Datadir                                Port    Status    Data Status    
.
.
.
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:-   l-test12   /export/gp_data/mirror/data3/gpseg22   50002   Passive   Synchronized
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[WARNING]:-l-test7    /export/gp_data/mirror/data4/gpseg23   50003   Failed                   <<<<<<<<
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:17:30:005969 gpstate:l-test5:gpadmin-[WARNING]:-1 segment(s) configured as mirror(s) have failed

 
[gpadmin@l-test5 ~]$ gpstate -e
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-Starting gpstate with args: -e
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3'
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.3.23 (Greenplum Database 5.4.1 build commit:4eb4d57ae59310522d53b5cce47aa505ed0d17d3) on x86_64-pc-linux-gnu, compi
led by GCC gcc (GCC) 6.2.0, 64-bit compiled on Jan 22 2018 18:15:33'
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-Obtaining Segment details from master...
20180211:16:17:35:006045 gpstate:l-test5:gpadmin-[INFO]:-Gathering data from segments...
.. 
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-Segment Mirroring Status Report
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-Primaries in Change Tracking
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-   Current Primary    Port    Change tracking size   Mirror             Port
20180211:16:17:37:006045 gpstate:l-test5:gpadmin-[INFO]:-   l-test9   40003   128 bytes              l-test7   50003

关闭整个greenplum集群

[gpadmin@l-test5 ~]$ gpstop -a -M fast
.
.
.
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-   Segments stopped successfully                              = 48
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-   Segments with errors during stop                           = 0
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-   
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[WARNING]:-Segments that are currently marked down in configuration   = 1    <<<<<<<<
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-            (stop was still attempted on these segments)
20180211:16:25:00:008223 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
.
.
.

以restricted方式启动数据库
[gpadmin@l-test5 ~]$ gpstart -a -R 
.
. 
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-Process results...
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-   Successful segment starts                                            = 47
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-Successfully started 47 of 47 segment instances 
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:25:22:008571 gpstart:l-test5:gpadmin-[INFO]:-Starting Master instance l-test5 directory /export/gp_data/master/gpseg-1 in RESTRICTED mode
20180211:16:25:23:008571 gpstart:l-test5:gpadmin-[INFO]:-Command pg_ctl reports Master l-test5 instance active
20180211:16:25:23:008571 gpstart:l-test5:gpadmin-[INFO]:-Starting standby master
20180211:16:25:23:008571 gpstart:l-test5:gpadmin-[INFO]:-Checking if standby master is running on host: l-test6  in directory: /export/gp_data/master/gpseg-1
20180211:16:25:24:008571 gpstart:l-test5:gpadmin-[INFO]:-Database successfully started

开始修复故障节点

[gpadmin@l-test5 ~]$ gprecoverseg -a
.
.
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Greenplum instance recovery parameters
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Recovery type              = Standard
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Recovery 1 of 1
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Synchronization mode                        = Incremental
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Failed instance host                        = l-test7
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Failed instance address                     = l-test7
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Failed instance directory                   = /export/gp_data/mirror/data4/gpseg23
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Failed instance port                        = 50003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Failed instance replication port            = 51003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Recovery Source instance host               = l-test9
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Recovery Source instance address            = l-test9
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Recovery Source instance directory          = /export/gp_data/primary/data4/gpseg23
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Recovery Source instance port               = 40003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Recovery Source instance replication port   = 41003
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-   Recovery Target                             = in-place
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:----------------------------------------------------------
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-1 segment(s) to recover
20180211:16:25:38:009108 gprecoverseg:l-test5:gpadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
. 
.

检查集群修复状态

一直等到Data Status 这个属性全部都是Synchronized即可进行下一步操作

[gpadmin@l-test5 ~]$ gpstate -m
.
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:-   Mirror              Datadir                                Port    Status    Data Status       
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:-   l-test11   /export/gp_data/mirror/data1/gpseg0    50000   Passive   Synchronized
.
.
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:-   l-test7    /export/gp_data/mirror/data4/gpseg23   50003   Passive   Resynchronizing
20180211:16:25:51:009237 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------

[gpadmin@l-test5 ~]$ 
[gpadmin@l-test5 ~]$ gpstate -m
.
.
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:-   Mirror              Datadir                                Port    Status    Data Status    
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:-   l-test11   /export/gp_data/mirror/data1/gpseg0    50000   Passive   Synchronized
.
.
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:-   l-test7    /export/gp_data/mirror/data4/gpseg23   50003   Passive   Synchronized
20180211:16:31:54:010763 gpstate:l-test5:gpadmin-[INFO]:--------------------------------------------------------------

重启greenplum集群

[gpadmin@l-test5 ~]$ gpstop -a -r
.
.
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-   Segments stopped successfully      = 48
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-   Segments with errors during stop   = 0
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-----------------------------------------------------
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-Successfully shutdown 48 of 48 segment instances 
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-Database successfully shutdown with no errors reported
20180211:16:36:43:012012 gpstop:l-test5:gpadmin-[INFO]:-Cleaning up leftover gpmmon process
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-No leftover gpmmon process found
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-Cleaning up leftover shared memory
20180211:16:36:44:012012 gpstop:l-test5:gpadmin-[INFO]:-Restarting System...
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
调度 Apache
Apache Doris tablet 副本修复的原理、流程及问题定位
Apache Doris tablet 副本修复的原理、流程及问题定位
507 0
|
存储 弹性计算 关系型数据库
实践教程之如何对PolarDB-X的存储节点发起备库重搭
PolarDB-X 为了方便用户体验,提供了免费的实验环境,您可以在实验环境里体验 PolarDB-X 的安装部署和各种内核特性。除了免费的实验,PolarDB-X 也提供免费的视频课程,手把手教你玩转 PolarDB-X 分布式数据库。本期实验将指导您如何对PolarDB-X的存储节点发起备库重搭。
|
存储 网络协议 索引
GlusterFS数据存储脑裂修复方案
本文档介绍了glusterfs中可用于监视复制卷状态的`heal info`命令以及解决脑裂的方法
1472 0
|
关系型数据库 数据库 C语言
Deepgreen & Greenplum 高可用(一) - Segment节点故障转移
尚书中云:惟事事,乃其有备,有备无患。这教导我们做事一定要有准备,做事尚且如此,在企事业单位发展中处于基础地位的数据仓库软件在运行过程中,何尝不需要有备无患呢? 今天别的不表,主要来谈谈企业级数据仓库软件Deepgreen和Greenplum的高可用特性之一:计算节点镜像。
8302 0
|
SQL 关系型数据库 MySQL
MySQL MGR集群单主模式的自动搭建和自动化故障修复
MySQL MGR集群单主模式的自动搭建和自动化故障修复/*the waiting game:尽管人生如此艰难,不要放弃;不要妥协;不要失去希望*/ 随着MySQL MGR的版本的升级以及技术成熟,在把MHA拉下神坛之后, MGR越来越成为MySQL高可用的首选方案。
1265 0
|
关系型数据库 数据库 C语言
Greenplum扩容节点步骤
为Greenplum添加计算节点
4218 0
|
存储 SQL 缓存
初探OceanBase的定期合并&数据分发
定期合并和数据分发都是将UpdateServer中的增量更新分发到ChunkServer中的手段,二者的整体流程比较类似:UpdateServer冻结当前的活跃内存表(Active MemTable),生成冻结内存表,并开启新的活跃内存表,后续的更新操作都写入新的活跃内存表。
13199 0