尚书中云:惟事事,乃其有备,有备无患。这教导我们做事一定要有准备,做事尚且如此,在企事业单位发展中处于基础地位的数据仓库软件在运行过程中,何尝不需要有备无患呢?
今天别的不表,主要来谈谈企业级数据仓库软件Deepgreen和Greenplum的高可用特性之一:计算节点镜像。
一、首先从理论上来讲,正常Segment节点和他的Mirror是分布在不同主机上的,以防止单点故障导致的数据库访问异常。当正常Segment节点出现故障时,Mirror节点可以自动接管Segment节点的服务,数据库仍然可以正常使用。这个过程对前台应用来说是透明的。
下面我们来看实操例子:
1.测试环境
本人笔记本电脑;
4核心,8G内存,1T硬盘
已安装1个主节点和2个计算节点,无镜像
2.首先查看集群状态:1 Master,2 Segments
## 执行命令:gpstate
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master instance parameters
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Database = template1
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master Port = 15432
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master directory = /dgdata/master/dg-1
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Timeout = 600 seconds
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Master standby = Off
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:-Segment instances that will be started
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:---------------------------------------
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- Host Datadir Port
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- flash /dgdata/primary/dg0 25432
20170628:06:26:51:003776 gpstart:flash:dgadmin-[INFO]:- flash /dgdata/primary/dg1 25433
3.创建镜像节点目录并添加节点
## 创建mirror目录
dgadmin@flash:~$ mkdir /dgdata/mirror
## 执行命令:gpaddmirrors添加节点
dgadmin@flash:~$ gpaddmirrors
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting gpaddmirrors with args:
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:06:41:50:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
Enter mirror segment data directory location 1 of 2 >
/dgdata/mirror
Enter mirror segment data directory location 2 of 2 >
/dgdata/mirror
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum Add Mirrors Parameters
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum master data directory = /dgdata/master/dg-1
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Greenplum master port = 15432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Parallel batch limit = 16
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror 1 of 2
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance directory = /dgdata/primary/dg0
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance port = 25432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance replication port = 28432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance directory = /dgdata/mirror/dg0
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance port = 26432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance replication port = 27432
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror 2 of 2
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance directory = /dgdata/primary/dg1
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance port = 25433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Primary instance replication port = 28433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance host = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance address = flash
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance directory = /dgdata/mirror/dg1
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance port = 26433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:- Mirror instance replication port = 27433
20170628:06:42:03:004294 gpaddmirrors:flash:dgadmin-[INFO]:----------------------------------------------------------
Continue with add mirrors procedure Yy|Nn (default=N):
> y
20170628:06:42:06:004294 gpaddmirrors:flash:dgadmin-[INFO]:-2 segment(s) to add
20170628:06:42:06:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Building template directory
20170628:06:42:07:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Validating remote directories
......
20170628:06:42:13:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Copying template directory file
.
20170628:06:42:14:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Configuring new segments
......
20170628:06:42:20:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Cleaning files
.
20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting file move procedure for flash:/dgdata/mirror/dg0:content=0:dbid=4:mode=r:status=u
20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting file move procedure for flash:/dgdata/mirror/dg1:content=1:dbid=5:mode=r:status=u
20170628:06:42:21:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating configuration with new mirrors
20170628:06:42:22:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating mirrors
......
20170628:06:42:28:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Starting mirrors
20170628:06:42:28:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.......
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Process results...
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating configuration to mark mirrors up
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Updating primaries
20170628:06:42:35:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
......
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Process results...
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Done updating primaries
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-******************************************************************
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Mirror segments have been added; data synchronization is in progress.
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Data synchronization will continue in the background.
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20170628:06:42:41:004294 gpaddmirrors:flash:dgadmin-[INFO]:-*****************************************************************
4.重新查看集群状态:确定2个Segments都已经添加好Mirror
dgadmin@flash:~$ gpstate
20170628:06:59:19:005589 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args:
20170628:06:59:19:005589 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:06:59:20:005589 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments...
....
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Master instance = Active
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total primary segment failures (at master) = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as primary segments = 0
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 2
20170628:06:59:24:005589 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
dgadmin@flash:~$
5.登录数据库进行查询测试
dgadmin@flash:~$ psql -d postgres
psql (8.2.15)
Type "help" for help.
postgres=# \c
You are now connected to database "postgres" as user "dgadmin".
postgres=# \l
List of databases
Name | Owner | Encoding | Access privileges
-----------+---------+----------+---------------------
postgres | dgadmin | UTF8 |
template0 | dgadmin | UTF8 | =c/dgadmin
: dgadmin=CTc/dgadmin
template1 | dgadmin | UTF8 | =c/dgadmin
: dgadmin=CTc/dgadmin
tpch | dgadmin | UTF8 |
tpch_1g | dgadmin | UTF8 |
(5 rows)
postgres=# \c tpch
You are now connected to database "tpch" as user "dgadmin".
tpch=# \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+----------+-------+---------+----------------------
public | customer | table | dgadmin | append only columnar
public | lineitem | table | dgadmin | append only columnar
public | nation | table | dgadmin | append only columnar
public | orders | table | dgadmin | append only columnar
public | part | table | dgadmin | append only columnar
public | partsupp | table | dgadmin | append only columnar
public | region | table | dgadmin | append only columnar
public | supplier | table | dgadmin | append only columnar
(8 rows)
tpch=# select * from region;
r_regionkey | r_name | r_comment
-------------+---------------------------+---------------------------------------------------------------------------------------------------------------------
1 | AMERICA | hs use ironic, even requests. s
3 | EUROPE | ly final courts cajole furiously final excuse
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
2 | ASIA | ges. thinly even pinto beans ca
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
6.模拟实验:Segment1实例异常
## 查询dg1的后台进程号:3898
dgadmin@flash:~$ ps -ef | grep postgres
postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
postgres 1099 1089 0 Jun27 ? 00:00:00 postgres: checkpointer process
postgres 1100 1089 0 Jun27 ? 00:00:00 postgres: writer process
postgres 1101 1089 0 Jun27 ? 00:00:00 postgres: wal writer process
postgres 1102 1089 0 Jun27 ? 00:00:00 postgres: autovacuum launcher process
postgres 1103 1089 0 Jun27 ? 00:00:02 postgres: stats collector process
dgadmin 3898 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg1 -p 25433 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1
dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
dgadmin 3915 3898 0 06:26 ? 00:00:00 postgres: port 25433, sweeper process
dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
dgadmin 3928 3927 0 06:27 ? 00:00:00 postgres: port 15432, master logger process
dgadmin 3931 3927 0 06:27 ? 00:00:00 postgres: port 15432, stats collector process
dgadmin 3932 3927 0 06:27 ? 00:00:00 postgres: port 15432, writer process
dgadmin 3933 3927 0 06:27 ? 00:00:00 postgres: port 15432, checkpoint process
dgadmin 3934 3927 0 06:27 ? 00:00:00 postgres: port 15432, seqserver process
dgadmin 3935 3927 0 06:27 ? 00:00:00 postgres: port 15432, ftsprobe process
dgadmin 3936 3927 0 06:27 ? 00:00:00 postgres: port 15432, sweeper process
dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1
dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0
dgadmin 5014 5013 0 06:42 ? 00:00:00 postgres: port 26432, logger process
dgadmin 5015 5012 0 06:42 ? 00:00:00 postgres: port 26433, logger process
dgadmin 5022 5013 0 06:42 ? 00:00:00 postgres: port 26432, mirror process
dgadmin 5023 5012 0 06:42 ? 00:00:00 postgres: port 26433, mirror process
dgadmin 5106 5103 0 06:42 ? 00:00:00 postgres: port 25433, primary consumer ack process
dgadmin 5107 5103 0 06:42 ? 00:00:01 postgres: port 25433, primary recovery process
dgadmin 5108 5103 0 06:42 ? 00:00:00 postgres: port 25433, primary verification process
dgadmin 5114 3899 0 06:42 ? 00:00:01 postgres: port 25432, primary process
dgadmin 5115 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary receiver ack process
dgadmin 5116 5114 1 06:42 ? 00:00:20 postgres: port 25432, primary sender process
dgadmin 5117 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary consumer ack process
dgadmin 5118 5114 0 06:42 ? 00:00:01 postgres: port 25432, primary recovery process
dgadmin 5119 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary verification process
dgadmin 5919 3733 0 07:08 pts/0 00:00:00 grep --color=auto postgres
## 杀死进程,模拟进程异常
dgadmin@flash:~$ ps -ef | grep postgres
postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
postgres 1099 1089 0 Jun27 ? 00:00:00 postgres: checkpointer process
dgadmin 3898 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg1 -p 25433 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1
dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
dgadmin 3936 3927 0 06:27 ? 00:00:00 postgres: port 15432, sweeper process
dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1
dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0
dgadmin 5919 3733 0 07:08 pts/0 00:00:00 grep --color=auto postgres
## 杀死进程,模拟异常
dgadmin@flash:~$ kill -9 3898
## 重新查看,dg1的进程3989已不存在
dgadmin@flash:~$ ps -ef | grep postgres
postgres 1089 1 0 Jun27 ? 00:00:01 /usr/lib/postgresql/9.3/bin/postgres -D /var/lib/postgresql/9.3/main -c config_file=/etc/postgresql/9.3/main/postgresql.conf
dgadmin 3899 1 0 06:26 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/primary/dg0 -p 25432 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
dgadmin 3927 1 0 06:27 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/master/dg-1 -p 15432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
dgadmin 5012 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg1 -p 26433 -b 5 -z 2 --silent-mode=true -i -M quiescent -C 1
dgadmin 5013 1 0 06:42 ? 00:00:00 /home/dgadmin/vitesse.deepgreendb.16.17.170518/bin/postgres -D /dgdata/mirror/dg0 -p 26432 -b 4 -z 2 --silent-mode=true -i -M quiescent -C 0
dgadmin 5014 5013 0 06:42 ? 00:00:00 postgres: port 26432, logger process
dgadmin 5119 5114 0 06:42 ? 00:00:00 postgres: port 25432, primary verification process
dgadmin 5939 3733 0 07:09 pts/0 00:00:00 grep --color=auto postgres
7.访问数据库测试连通性:仍然可以正常查询
dgadmin@flash:~$ psql -d tpch
psql (8.2.15)
Type "help" for help.
tpch=# select * from region;
r_regionkey | r_name | r_comment
-------------+---------------------------+---------------------------------------------------------------------------------------------------------------------
1 | AMERICA | hs use ironic, even requests. s
3 | EUROPE | ly final courts cajole furiously final excuse
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
2 | ASIA | ges. thinly even pinto beans ca
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
8.查看集群状态:显示有1个primary节点异常
dgadmin@flash:~$ gpstate
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args:
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:07:15:28:006079 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments...
.
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Master instance = Active
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 1
20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total primary segment failures (at master) = 1 <<<<<<<<
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total number postmaster processes missing = 1 <<<<<<<<
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 1
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:07:15:29:006079 gpstate:flash:dgadmin-[WARNING]:-Total number mirror segments acting as primary segments = 1 <<<<<<<<
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 1
20170628:07:15:29:006079 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
dgadmin@flash:~$
至此本实验完成,可以看到,在Segment主实例异常情况下,Mirror会立即接管服务,不会对前台应用产生停机影响。
实验完成了,但是事情还没有结束哦,因为这是故障,所以故障产生后,需要进行修复,那么基于整个集群,我们都需要做什么呢?
要点一:及时恢复故障节点
要点二:虽然Mirror可以临时接管服务,保持服务的连续性,但是在实际生产过程中,由于节点及其Mirror的分散性,长期使用Mirror会导致数据分布不均匀,所以故障修复后,建议及时切换回原来的架构。
二、故障节点恢复
节点在故障以后,可以通过gprecoverseg命令恢复故障节点,如下:
dgadmin@flash:~$ gprecoverseg
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Starting gprecoverseg with args:
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Checking if segments are ready
20170628:22:01:34:001650 gprecoverseg:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Greenplum instance recovery parameters
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Recovery type = Standard
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:-Recovery 1 of 1
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Synchronization mode = Incremental
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance host = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance address = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance directory = /dgdata/primary/dg1
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance port = 25433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Failed instance replication port = 28433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance host = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance address = flash
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance directory = /dgdata/mirror/dg1
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance port = 26433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Source instance replication port = 27433
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:- Recovery Target = in-place
20170628:22:01:35:001650 gprecoverseg:flash:dgadmin-[INFO]:----------------------------------------------------------
Continue with segment recovery procedure Yy|Nn (default=N):
> y
20170628:22:01:38:001650 gprecoverseg:flash:dgadmin-[INFO]:-1 segment(s) to recover
20170628:22:01:38:001650 gprecoverseg:flash:dgadmin-[INFO]:-Ensuring 1 failed segment(s) are stopped
20170628:22:01:39:001650 gprecoverseg:flash:dgadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20170628:22:01:45:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating configuration with new mirrors
20170628:22:01:45:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating mirrors
......
20170628:22:01:51:001650 gprecoverseg:flash:dgadmin-[INFO]:-Starting mirrors
20170628:22:01:51:001650 gprecoverseg:flash:dgadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.......
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Process results...
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating configuration to mark mirrors up
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating primaries
20170628:22:01:58:001650 gprecoverseg:flash:dgadmin-[INFO]:-Commencing parallel primary conversion of 1 segments, please wait...
.......
20170628:22:02:05:001650 gprecoverseg:flash:dgadmin-[INFO]:-Process results...
20170628:22:02:05:001650 gprecoverseg:flash:dgadmin-[INFO]:-Done updating primaries
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-******************************************************************
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-Updating segments for resynchronization is completed.
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20170628:22:02:06:001650 gprecoverseg:flash:dgadmin-[INFO]:-******************************************************************
查看集群状态
dgadmin@flash:~$ gpstate
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Starting gpstate with args:
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build Deepgreen DB'
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.99.00 build Deepgreen DB) on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4 compiled on May 18 2017 05:19:19'
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Obtaining Segment details from master...
20170628:22:19:45:003527 gpstate:flash:dgadmin-[INFO]:-Gathering data from segments...
.
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-Greenplum instance status summary
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Master instance = Active
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Master standby = No master standby configured
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total segment instance count from metadata = 4
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Primary Segment Status
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segments = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segment valid (at master) = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total primary segment failures (at master) = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Mirror Segment Status
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segments = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segment valid (at master) = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total mirror segment failures (at master) = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number of /tmp lock files found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes missing = 0
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number postmaster processes found = 2
20170628:22:19:46:003527 gpstate:flash:dgadmin-[WARNING]:-Total number mirror segments acting as primary segments = 1 <<<<<<<<
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:- Total number mirror segments acting as mirror segments = 1
20170628:22:19:46:003527 gpstate:flash:dgadmin-[INFO]:-----------------------------------------------------
三、切换回原集群状态
从第二部分的最后状态代码也可以看出,目前是一个Mirror节点接管了Primary节点的服务,我们本节要讲Mirror的服务交还给Primary。
下面执行:gprecoverseg -r 命令进行节点切换,切换完成后执行gpstate查看状态,代码略。
另外,我们也可以在数据库里面通过字典表查看切换信息:
postgres=# select * from gp_configuration_history;
time | dbid | desc
-------------------------------+------+-----------------------------------------------------------------------------------------------
2017-06-28 06:42:21.789641+08 | 4 | gpaddmirrors: segment config for resync: inserted mirror segment configuration
2017-06-28 06:42:21.789641+08 | 5 | gpaddmirrors: segment config for resync: inserted mirror segment configuration
2017-06-28 06:42:21.789641+08 | 2 | gpaddmirrors: segment config for resync: segment mode, status, and replication port
2017-06-28 06:42:21.789641+08 | 3 | gpaddmirrors: segment config for resync: segment mode, status, and replication port
2017-06-28 06:42:35.327377+08 | 2 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:42:35.327377+08 | 3 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:42:35.327377+08 | 4 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:42:35.327377+08 | 5 | gpaddmirrors: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 06:56:03.98806+08 | 2 | FTS: changed segment to insync from resync.
2017-06-28 06:56:03.98806+08 | 4 | FTS: changed segment to insync from resync.
2017-06-28 06:56:04.045572+08 | 3 | FTS: changed segment to insync from resync.
2017-06-28 06:56:04.045572+08 | 5 | FTS: changed segment to insync from resync.
2017-06-28 07:10:07.745568+08 | 3 | FTS: content 1 fault marking status DOWN role m
2017-06-28 07:10:07.745849+08 | 5 | FTS: content 1 fault marking status UP mode: change-tracking role p
2017-06-28 22:01:45.341958+08 | 3 | gprecoverseg: segment config for resync: segment mode and status
2017-06-28 22:01:58.732622+08 | 3 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:01:58.732622+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:02:57.53962+08 | 5 | FTS: changed segment to insync from resync.
2017-06-28 22:02:57.53962+08 | 3 | FTS: changed segment to insync from resync.
2017-06-28 22:31:02.680013+08 | 5 | FTS: content 1 fault marking status DOWN role m
2017-06-28 22:31:02.680309+08 | 3 | FTS: content 1 fault marking status UP mode: change-tracking role p
2017-06-28 22:31:12.655695+08 | 5 | gprecoverseg: segment config for resync: segment mode and status
2017-06-28 22:31:25.848857+08 | 3 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:31:25.848857+08 | 5 | gprecoverseg: segment resync marking mirrors up and primaries resync: segment mode and status
2017-06-28 22:31:58.752335+08 | 3 | FTS: changed segment to insync from resync.
2017-06-28 22:31:58.752335+08 | 5 | FTS: changed segment to insync from resync.
(26 rows)
这样,整个今天的分享就结束了,最后再啰嗦一句。从本文例子看出,数据库主/备切换相当的简单,gprecoverseg命令相当的智能,在Primary的主机出现故障之后,Mirror会自动切换为Primary,不影响数据库的正常工作,但是对监控不是很到位的系统来说,不建议使用这个功能,首先这个功能存在一定的BUG,其次,监控不到位,一旦发现切换,并不能及时发现,如果再有节点出现故障,可能对数据恢复造成影响,而且如果单个节点的数据量非常大的时候,gprecoverseg同步数据的过程将会很漫长。
最后,祝大家再Deepgreen & Greenplum的路上一去不复返^_^
同系列文章参考: