我这边是部署了suse linux ha 的一套主从PGSQL数据库,
然后有一天主库monitor 超时,导致发起了关闭主库 并且promote备库。
但是promote 备库超时了。而备库的pg_log的日志已经被后来恢复的时候删掉了。
请问有什么方法可以看到当时为什么promote超时 ,还有为什么主库会monitor超时。
这些东西要从哪里入手? 我在corosync的日志里看不出具体的原因,只看到触发了什么操作等.
ps:主库monitor超时的时候的pg_log ,显示做了backup.
01:28:06 [unknown] postgres NOTICE: pg_stop_backup cleanup done, waiting for required WAL segments to be archived
01:28:23 [unknown] postgres NOTICE: pg_stop_backup complete, all required WAL segments have been archived
01:28:25 LOG: received fast shutdown request
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
promote并没有超时的说法,建议你再梳理一下corosync的流程。包括这个备份信息是不是corosync切换流程中的一环。
另外再给你一个信息, promote分两种情况,一种需要做检查点,另一种不需要。
if (fast_promote)
{
checkPointLoc = ControlFile->prevCheckPoint;
/*
* Confirm the last checkpoint is available for us to recover
* from if we fail. Note that we don't check for the secondary
* checkpoint since that isn't available in most base backups.
*/
record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, false);
if (record != NULL)
{
fast_promoted = true;
/*
* Insert a special WAL record to mark the end of
* recovery, since we aren't doing a checkpoint. That
* means that the checkpointer process may likely be in
* the middle of a time-smoothed restartpoint and could
* continue to be for minutes after this. That sounds
* strange, but the effect is roughly the same and it
* would be stranger to try to come out of the
* restartpoint and then checkpoint. We request a
* checkpoint later anyway, just for safety.
*/
CreateEndOfRecoveryRecord();
}
}
if (!fast_promoted)
RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
CHECKPOINT_IMMEDIATE |
CHECKPOINT_WAIT);
}
如果是这样导致的corosync判断超时的话,建议你用fast promote.