pgpool-II中,与health check 相干的配置文件项有两个:
health_check_period
health_check_timeout
乍一看他们 文档的解释,看官方网站的说法:
http://pgpool.projects.postgresql.org/pgpool-II/doc/pgpool-en.html
health_check_period
This parameter specifies the interval between the health checks in seconds.
Default is 0, which means health check is disabled. You need to reload pgpool.conf if you change health_check_period.
复制代码
health_check_timeout
pgpool-II periodically tries to connect to the backends to detect any error on the servers or networks. This error check procedure is called "health check".
If an error is detected, pgpool-II tries to perform failover or degeneration.
This parameter serves to prevent the health check from waiting for a long time in acase such as un unplugged network cable. The timeout value is in seconds. Default value is 20.
0 disables timeout (waits until TCP/IP timeout).
This health check requires one extra connection to each backend,
so max_connections in the postgresql.conf needs to be incremented as needed. You need to reload pgpool.conf if you change this value.
复制代码
实际的情形如何呢,这里以 pgpool-II 3.1 为例(为了看着方便,去掉了一部分不重要的代码):
复制代码
/*
* pgpool main program
*/
int main(int argc, char **argv)
{
……
/*
* This is the main loop
*/
for (;;)
{
CHECK_REQUEST;
/* do we need health checking for PostgreSQL? */
if (pool_config->health_check_period > 0)
{
……
if (pool_config->health_check_timeout > 0)
{
/*
* set health checker timeout. we want to detect
* communication path failure much earlier before
* TCP/IP stack detects it.
*/
pool_signal(SIGALRM, health_check_timer_handler);
alarm(pool_config->health_check_timeout);
}
/*
* do actual health check. trying to connect to the backend
*/
errno = 0;
health_check_timer_expired = 0;
POOL_SETMASK(&UnBlockSig);
sts = health_check();
POOL_SETMASK(&BlockSig);
if (pool_config->parallel_mode || pool_config->enable_query_cache)
sys_sts = system_db_health_check();
if ((sts > 0 || sys_sts < 0)
&& (errno != EINTR || (errno == EINTR && health_check_timer_expired)))
{
if (sts > 0)
{
sts--;
if (!pool_config->parallel_mode)
{
if (POOL_DISALLOW_TO_FAILOVER(BACKEND_INFO(sts).flag))
{
pool_log("health_check: %d failover is canceld
because failover is disallowed", sts);
}
else
{
pool_log("set %d th backend down status", sts);
Req_info->kind = NODE_DOWN_REQUEST;
Req_info->node_id[0] = sts;
failover();
/* need to distribute this info to children */
}
}
else
{
retrycnt++;
pool_signal(SIGALRM, SIG_IGN); /* Cancel timer */
if (retrycnt > NUM_BACKENDS)
{
/* retry count over */
pool_log("set %d th backend down status", sts);
Req_info->kind = NODE_DOWN_REQUEST;
Req_info->node_id[0] = sts;
failover();
retrycnt = 0;
}
else
{
/* continue to retry */
sleep_time = pool_config->health_check_period/
NUM_BACKENDS;
pool_debug("retry sleep time: %d seconds", sleep_time);
pool_sleep(sleep_time);
continue;
}
}
}
……
}
if (pool_config->health_check_timeout > 0)
{
/* seems ok. cancel health check timer */
pool_signal(SIGALRM, SIG_IGN);
}
sleep_time = pool_config->health_check_period;
pool_sleep(sleep_time);
}
else
{
for (;;)
{
int r;
struct timeval t = {3, 0};
POOL_SETMASK(&UnBlockSig);
r = pool_pause(&t);
POOL_SETMASK(&BlockSig);
if (r > 0)
break;
}
}
}
pool_shmem_exit(0);
}
复制代码
可以看得比较清楚了,
第一点,health_check_period的作用,如果不为零,则health_check可以发生。
其他非零值其实都是一样。
第二点,health_check_timeout的作用,如果>0,则会被设置timer,timer到时间后,激 活 health_check_timer_handler,对调用 health_check()函数的。
第三点,这里是最坑爹的部分了:
在主循环里面,只要 health_check_period不为零,则要不断地在循环里面作 health_check()动作。
这个一般而言比 缺省的 health_check_timeout 20秒可高多了。
实际运行 pgpool命令的时候,如果加入 -d 参数,就可以看到这一点:pgpool-II不断通过调用healt_check()来检查各节点状况。
可以说,有了这个主循环里面折腾 health_check以后,health_check_timeout就形同虚设了。
只是不知道从哪个版本开始变成这样的,或者可以说 pgpool-II的开发者很不负责,没有很好地协调代码和文档。也许这是很多开源项目的通病了。
本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/archive/2012/07/27/2611935.html,如需转载请自行联系原作者