Greenplum 激活standby master失败后的异常修复

本文涉及的产品
RDS Agent(兼容OpenClaw),2核4GB
RDS MySQL DuckDB 分析主实例,基础系列 4核8GB
RDS DuckDB + QuickBI 企业套餐,8核32GB + QuickBI 专业版
简介:
激活standby master失败后,主库和备库都起不来了。

如下,修改了MASTER_DATA_DIRECTORY和PGPORT环境变量为新的主库,启动主库。
$gpstart -a
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -a
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:49:41:073138 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:49:43:073138 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:49:43:073138 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-2 -l /disk1/digoal/gpdata/gpseg-2/pg_log/startup.log -w -t 600 -o " -p 1922 -b 48 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-2/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'
失败
手工执行命令当然也不行
env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-2 -l /disk1/digoal/gpdata/gpseg-2/pg_log/startup.log -w -t 600 -o " -p 1922 -b 48 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start

使用master only模式启动当然也是不行的。
$gpstart -m
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -m
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:58:05:077478 gpstart:digoal_host:digoal-[INFO]:-Master-only start requested in configuration without a standby master.

Continue with master-only startup Yy|Nn (default=N):
> y
20151222:16:58:06:077478 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:58:08:077478 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:58:08:077478 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-2 -l /disk1/digoal/gpdata/gpseg-2/pg_log/startup.log -w -t 600 -o " -p 1922 -b 48 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-2/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'

限制模式启动也不行
$gpstart -R
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -R
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:57:21:076997 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:57:24:076997 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:57:24:076997 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-1 -l /disk1/digoal/gpdata/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 1921 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 48 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'

修改了MASTER_DATA_DIRECTORY和PGPORT环境变量为老的主库
然后试图激活原来的主库也失败
$gpactivatestandby  -f
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:------------------------------------------------------
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Standby data directory    = /disk1/digoal/gpdata/gpseg-1
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Standby port              = 1921
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Standby running           = no
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Force standby activation  = yes
20151222:16:51:28:074293 gpactivatestandby:digoal_host:digoal-[INFO]:------------------------------------------------------
Do you want to continue with standby master activation? Yy|Nn (default=N):
> y
20151222:16:51:29:074293 gpactivatestandby:digoal_host:digoal-[INFO]:-Starting standby master database in utility mode...
20151222:16:51:31:074293 gpactivatestandby:digoal_host:digoal-[CRITICAL]:-Error activating standby master: ExecutionError: 'non-zero rc: 2' occured.  Details: 'GPSTART_INTERNAL_MASTER_ONLY=1 $GPHOME/bin/gpstart -a -m -v'  cmd had rc=2 completed=True halted=False
  stdout='20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -a -m -v
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Setting level of parallelism to: 64
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Checking if GPHOME env variable is set.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Checking if LOGNAME or USER env variable is set.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:---Checking that current user can use GP binaries
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Obtaining master's port from master data directory
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Read from postgresql.conf port=1921
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Read from postgresql.conf max_connections=48
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-gp_external_grant_privileges is None
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Reading the gp_dbid file - /disk1/digoal/gpdata/gpseg-1/gp_dbid...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : # Greenplum Database identifier for this master/segment. ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : # Do not change the contents of this file. ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : dbid = 1 ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Found match for dbid: 1.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Parsing : standby_dbid = 48 ...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Found match for standby_dbid: 48.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[DEBUG]:-Check if Master is already running...
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Master-only start requested for management utilities.
20151222:16:51:29:074365 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:51:31:074365 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:51:31:074365 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-1 -l /disk1/digoal/gpdata/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 1921 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 48 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'
'
  stderr=''


老的主库,以master only启动也失败。
$gpstart -m
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -m
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-Master-only start requested in a configuration with a standby master.
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-This is advisable only under the direct supervision of Greenplum support. 
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-This mode of operation is not supported in a production environment and 
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-may lead to a split-brain condition and possible unrecoverable data loss.
20151222:16:57:43:077229 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************

Continue with master-only startup Yy|Nn (default=N):
> y
20151222:16:57:44:077229 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:16:57:46:077229 gpstart:digoal_host:digoal-[CRITICAL]:-Failed to start Master instance in admin mode
20151222:16:57:46:077229 gpstart:digoal_host:digoal-[CRITICAL]:-Error occurred: non-zero rc: 1
 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /disk1/digoal/gpdata/gpseg-1 -l /disk1/digoal/gpdata/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 1921 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 48 -c gp_role=utility " start'
rc=1, stdout='waiting for server to start...... stopped waiting
', stderr='pg_ctl: PID file "/disk1/digoal/gpdata/gpseg-1/postmaster.pid" does not exist
pg_ctl: could not start server
Examine the log output.
'

老的主库启动时,报错如下:
2015-12-22 16:57:45.959837 CST,,,p77246,th273340192,,,,0,,,seg-1,,,,,"LOG","00000","Found recovery.conf file, checking appropriate parameters  for recovery in standby mode",,,,,,,0,,"xlog.c",5663,
2015-12-22 16:57:46.010953 CST,,,p77246,th273340192,,,,0,,,seg-1,,,,,"FATAL","XX000","recovery command file ""recovery.conf"" request for standby mode not specified (xlog.c:5756)",,,,,,,0,,"xlog.c",5756,"Stack trace:
1    0xb04cde postgres errstart (elog.c:502)
2    0x54afe7 postgres XLogReadRecoveryCommandFile (xlog.c:5754)
3    0x560a84 postgres StartupXLOG (xlog.c:6441)
4    0x564966 postgres StartupProcessMain (xlog.c:10970)
5    0x5f4675 postgres AuxiliaryProcessMain (bootstrap.c:463)
6    0x8eacd4 postgres <symbol not found> (postmaster.c:7589)
7    0x8eaefd postgres StartMasterOrPrimaryPostmasterProcesses (postmaster.c:1576)
8    0x8fce76 postgres doRequestedPrimaryMirrorModeTransitions (primary_mirror_mode.c:1735)
9    0x8f4122 postgres <symbol not found> (postmaster.c:2272)
10   0x8f76f0 postgres PostmasterMain (postmaster.c:7589)
11   0x7fa58f postgres main (main.c:206)
12   0x7f1b11cb4cdd libc.so.6 __libc_start_main (??:0)
13   0x4c2cf9 postgres <symbol not found> (??:0)
"
2015-12-22 16:57:46.012709 CST,,,p77240,th273340192,,,,0,,,seg-1,,,,,"LOG","00000","startup process (PID 77246) exited with exit code 1",,,,,,,0,,"postmaster.c",5854,
2015-12-22 16:57:46.012735 CST,,,p77240,th273340192,,,,0,,,seg-1,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",4706,

进入老的主库数据目录,发现多了两个文件
cd /disk1/digoal/gpdata/gpseg-1
-rw-r--r-- 1 digoal users     0 Dec 22 16:51 promote
-rw-r--r-- 1 digoal users     0 Dec 22 16:51 recovery.conf
promote代表要激活它,recovery.conf没有用。
把这两个文件删掉。

现在要做的时,把老的主库起来,然后删掉不能起来的standby master。
$gpstart -m
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Starting gpstart with args: -m
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Gathering information and validating the environment...
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.6.1 build 2'
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[INFO]:-Greenplum Catalog Version: '201310150'
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-Master-only start requested in a configuration with a standby master.
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-This is advisable only under the direct supervision of Greenplum support. 
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-This mode of operation is not supported in a production environment and 
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-may lead to a split-brain condition and possible unrecoverable data loss.
20151222:18:20:28:116706 gpstart:digoal_host:digoal-[WARNING]:-****************************************************************************

Continue with master-only startup Yy|Nn (default=N):
> y
20151222:18:20:29:116706 gpstart:digoal_host:digoal-[INFO]:-Starting Master instance in admin mode
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Obtaining Greenplum Master catalog information
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Obtaining Segment details from master...
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Setting new master era
20151222:18:20:32:116706 gpstart:digoal_host:digoal-[INFO]:-Master Started...

删除standby master
$gpinitstandby -r
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:------------------------------------------------------
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Warm master standby removal parameters
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:------------------------------------------------------
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum master hostname               = digoal_host.sqa.zmf
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum master data directory         = /disk1/digoal/gpdata/gpseg-1
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum master port                   = 1921
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum standby master hostname       = digoal_host.sqa.zmf
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum standby master port           = 1922
20151222:18:20:51:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Greenplum standby master data directory = /disk1/digoal/gpdata/gpseg-2
Do you want to continue with deleting the standby master? Yy|Nn (default=N):
> y
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing standby master from catalog...
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Database catalog updated successfully.
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing standby entry from gp_transaction_files_filespace flat file
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing standby entry from gp_temporary_files_filespace flat file
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Removing filespace directories on standby master...
20151222:18:20:52:116968 gpinitstandby:digoal_host:digoal-[INFO]:-Successfully removed standby master

现在可以关闭并启动主库了。
$gpstop -M fast -a
$gpstart -a

目录
相关文章
|
2月前
|
安全 Shell 开发工具
分支名从 main 改成 master?本地怎么改、远程(GitHub)怎么改、如果别人也在用这个仓库该怎么办?
本文详解将 Git 仓库默认分支从 `main` 迁移至 `master` 的完整流程:本地重命名、推送新分支、GitHub 后台切换默认分支、删除旧分支、更新跟踪关系,并涵盖团队协作同步与常见报错处理,操作安全清晰。(239字)
459 11
|
运维 网络协议 安全
小白带你学习linux的防火墙
小白带你学习linux的防火墙
529 1
|
消息中间件 运维 Kafka
kafka使用SASL认证
kafka使用SASL认证
384 0
|
SQL 关系型数据库 MySQL
Mycat【Mycat部署安装(核心配置及目录结构、安装以及管理命令详解)Mycat高级特性(读写分离概述、搭建读写分离、MySQL双主双从原理)】(三)-全面详解(学习总结---从入门到深化)
Mycat【Mycat部署安装(核心配置及目录结构、安装以及管理命令详解)Mycat高级特性(读写分离概述、搭建读写分离、MySQL双主双从原理)】(三)-全面详解(学习总结---从入门到深化)
1331 0
|
Kubernetes 网络协议 关系型数据库
Kubernetes----ExternalName类型的Service
Kubernetes----ExternalName类型的Service
2974 0
|
弹性计算 运维 安全
阿里云无影云电脑详细介绍(原无影云桌面)
阿里云无影云电脑详细介绍(原无影云桌面),什么是阿里云无影云电脑?无影云电脑(原云桌面)是一种快速构建、高效管理桌面办公环境,无影云电脑可用于远程办公、多分支机构、安全OA、短期使用、专业制图等使用场景,阿里云百科分享无影云桌面的详细介绍、租用价格、云电脑的优势、使用场景、网络架构、无影云电脑与云服务器的区别以及关于无影云电脑的常见问题解答FAQ
1637 0
|
Prometheus Kubernetes Cloud Native
k8s安装kube-promethues(超详细)
k8s安装kube-promethues(超详细)
2126 0
|
弹性计算
阿里云服务器租用价格表,2024年5月最新报价整理
2024年5月,阿里云发布了最新的服务器租用价格表。其中,ECS云服务器2核2G3M带宽年费99元,ECS u1实例2核4G5M年费199元。轻量应用服务器香港30M带宽月费24元,年费288元。此外,还提供4核16G10M和8核32G10M的不同配置选项,价格分别为30元/月和109元/月起。阿里云服务器分为ECS和轻量应用服务器,当前优惠主要针对ECS。更多详细配置和价格可在官方页面查看。同时,阿里云有多种优惠活动,包括免费试用、学生优惠等,可在活动中心了解。
1745 9
|
JavaScript Linux
【详细讲解】Linux grep命令用法大全 片尾有示例搜索指定目录中指定文件后缀的指定字符
【详细讲解】Linux grep命令用法大全 片尾有示例搜索指定目录中指定文件后缀的指定字符
983 1
|
Kubernetes 关系型数据库 MySQL
MySQL在Kubernetes上的高可用实现
【5月更文挑战第1天】
1541 5