使用pt-heartbeat监控主从复制延迟

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS PostgreSQL,集群系列 2核4GB
简介:     MySQL主从复制是MySQL 高可用架构中重要的组成部分,该技术可以用于实现负载均衡,高可用和故障切换,以及提供备份等等。对于主从复制的监控,仅仅依赖于MySQL自身提供的show slave status并不可靠。

    MySQL主从复制是MySQL 高可用架构中重要的组成部分,该技术可以用于实现负载均衡,高可用和故障切换,以及提供备份等等。对于主从复制的监控,仅仅依赖于MySQL自身提供的show slave status并不可靠。pt-heartbeat是主从复制延迟监控的不错选择,本文描述了主从复制情形下的延迟监控并给出相应示例。

    pt-heartbeat为percona-toolkit工具包中的一个,因此使用前需要先安装percona-toolkit,请参考:percona-toolkit的安装及简介

 

1、pt-heartbeat的作用
    pt-heartbeat measures replication lag on a MySQL or PostgreSQL server.  You can use it to update a master or monitor a replica.  If possible, MySQL connection options are read from your .my.cnf file.  For more details, please use the --help option, or try 'perldoc /usr/bin/pt-heartbeat' for complete  documentation.

    pt-heartbeat is a two-part MySQL and PostgreSQL replication delay monitoring system that measures delay by looking at actual replicated data. This avoids reliance on the replication mechanism itself, which is unreliable. (For example, SHOW SLAVE STATUS on MySQL).

 

2、pt-heartbeat的原理
    The first part is an --update instance of pt-heartbeat that connects to a master and updates a timestamp (“heartbeat record”) every --interval seconds. Since the heartbeat table may contain records from multiple masters (see “MULTI-SLAVE HIERARCHY”), the server’s ID (@@server_id) is used to identify records.
    主库上存在一个用于检查延迟的表heartbeat,可手动或自动创建
    pt-heartbeat使用--update参数连接到主库上并持续(根据设定的--interval参数)使用一个时间戳更新到表heartbeat
   
    The second part is a --monitor or --check instance of pt-heartbeat that connects to a slave, examines the replicated heartbeat record from its immediate master or the specified --master-server-id, and computes the difference from the current system time. If replication between the slave and the master is delayed or broken, the computed difference will be greater than zero and otentially increase if --monitor is specified.
    pt-heartbeat使用--monitor 或--check连接到从库,检查从主库同步过来的时间戳,并与当前系统时间戳进行比对产生一个差值,
    该值则用于判断延迟。(注,前提条件是主库与从库应保持时间同步)

    You must either manually create the heartbeat table on the master or use --create-table. See --create-table for the proper heartbeat table structure. The MEMORY storage engine is suggested, but not re-quired of course, for MySQL.
    The heartbeat table must contain a heartbeat row. By default, a heartbeat row is inserted if it doesn’t exist. This feature can be disabled with the --[no]insert-heartbeat-row option in case the database user does not have INSERT privileges.

    pt-heartbeat depends only on the heartbeat record being replicated to the slave, so it works regardless of the replication mechanism (built-in replication, a system such as Continuent Tungsten, etc). It works at any depth in the replication hierarchy; for example, it will reliably report how far a slave lags its master’s master’s master. And if replication is   stopped, it will continue to work and report (accurately!) that the slave is falling further and further behind the master.
    pt-heartbeat has a maximum resolution of 0.01 second. The clocks on the master and slave servers must be closely synchronized via NTP. By default, --update checks happen on the edge of the second (e.g. 00:01) and --monitor checks happen halfway between seconds (e.g. 00:01.5). As long as the servers’ clocks are closely synchronized and replication events are propagating in less than half a second, pt-heartbeat will report zero seconds of delay.
    pt-heartbeat will try to reconnect if the connection has an error, but will not retry if it can’t get a connection when it first starts.
    The --dbi-driver option lets you use pt-heartbeat to monitor PostgreSQL as well. It is reported to work well  with Slony-1 replication.

 

3、获取pt-heartbeat帮助信息
a、获取帮助信息
  [root@DBMASTER01 ~]# pt-heartbeat #直接输入pt-heartbeat可获得一个简要描述,使用pt-heartbeat --help获得一个完整帮助信息
  Usage: pt-heartbeat [OPTIONS] [DSN] --update|--monitor|--check|--stop
  
  Errors in command-line arguments:
    * Specify at least one of --stop, --update, --monitor or --check
    * --database must be specified

b、几个重要的参数
  Specify at least one of --stop, --update, --monitor, or --check. #至少指定一个
  --update, --monitor, and --check are mutually exclusive.         #互斥参数
  --daemonize and --check are mutually exclusive.                  #互斥参数
  
  --check
  Check slave delay once and exit. If you also specify --recurse, the tool will try to discover slave’s of the
  given slave and check and print their lag, too. The hostname or IP and port for each slave is printed before its
  delay. --recurse only works with MySQL.
  
  --daemonize
  Fork to the background and detach from the shell. POSIX operating systems only.

  --frames
  type: string; default: 1m,5m,15m
  Timeframes for averages.
  Specifies the timeframes over which to calculate moving averages when --monitor is given. Specify as a
  comma-separated list of numbers with suffixes. The suf?x can be s for seconds, m for minutes, h for hours, or d
  for days. The size of the largest frame determines the maximum memory usage, as up to the specified number
  of per-second samples are kept in memory to calculate the averages. You can specify as many timeframes as
  you like.
  
  --monitor
  Monitor slave delay continuously.
  Specifies that pt-heartbeat should check the slave’s delay every second and report to STDOUT (or if --file
  is given, to the file instead). The output is the current delay followed by moving averages over the timeframe
  given in --frames. For example,
  5s [ 0.25s, 0.05s, 0.02s ]
  
  --stop
  Stop running instances by creating the sentinel file.
  
  --update
  Update a master’s heartbeat.  

 

4、演示使用pt-heartbeat

a、首先添加表
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --create-table --update 

MASTER> select * from heartbeat;
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| ts                         | server_id | file             | position  | relay_master_log_file | exec_master_log_pos |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+
| 2014-12-01T09:48:14.003020 |        11 | mysql-bin.000390 | 677136957 | mysql-bin.000179      |                 120 |
+----------------------------+-----------+------------------+-----------+-----------------------+---------------------+

b、更新主库上的heartbeat
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --update &
[1] 31249

c、从库上监控延迟
[root@DBBAK01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --monitor --print-master-server-id
1.00s [  0.02s,  0.00s,  0.00s ] 11  #实时延迟,1分钟延迟,5分钟延迟,15分钟延迟
1.00s [  0.03s,  0.01s,  0.00s ] 11  # Author : Leshami
1.00s [  0.05s,  0.01s,  0.00s ] 11  # Blog   : http://blog.csdn.net/leshami
1.00s [  0.07s,  0.01s,  0.00s ] 11
1.00s [  0.08s,  0.02s,  0.01s ] 11
1.00s [  0.10s,  0.02s,  0.01s ] 11
1.00s [  0.12s,  0.02s,  0.01s ] 11
1.00s [  0.13s,  0.03s,  0.01s ] 11

d、其他操作示例
#将主库上的update使用守护进程方式调度
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --update --daemonize

#修改主库上的更新间隔为2s 
[root@DBMASTER01 ~]# pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --update --daemonize --interval=2

#停止主库上的pt-heartbeat守护进程
[root@DBMASTER01 ~]# pt-heartbeat --stop
Successfully created file /tmp/pt-heartbeat-sentinel
[root@DBMASTER01 ~]# rm -rf /tmp/pt-heartbeat-sentinel

#单次查看从库上的延迟情况
[robin@DBBAK01 ~]$ pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
> --master-server-id=11 --check 
1.00

#使用守护进程监控从库并输出日志
[root@DBBAK01 ~]#  pt-heartbeat --user=root --password=xxx -S /tmp/mysql.sock -D test \
--master-server-id=11 --monitor --print-master-server-id --daemonize --log=/tmp/slave-lag.log

相关实践学习
如何快速连接云数据库RDS MySQL
本场景介绍如何通过阿里云数据管理服务DMS快速连接云数据库RDS MySQL,然后进行数据表的CRUD操作。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
弹性计算 网络协议 容灾
PostgreSQL 时间点恢复(PITR)在异步流复制主从模式下,如何避免主备切换后PITR恢复(备库、容灾节点、只读节点)走错时间线(timeline , history , partial , restore_command , recovery.conf)
标签 PostgreSQL , 恢复 , 时间点恢复 , PITR , restore_command , recovery.conf , partial , history , 任意时间点恢复 , timeline , 时间线 背景 政治正确非常重要,对于数据库来说亦如此,一个基于流复制的HA架构的集群,如果还有一堆只读节点,当HA集群发生了主备切换后,这些只读节点能否与新的主节点保持
1835 0
|
SQL
记一次不常见到主从延迟问题
Slave_SQL_Running_State: Waiting for dependent transaction to commit 导致的主从延迟
7697 1
|
网络安全
Heartbeat配置方案
Heartbeat配置方案
|
监控 数据库
主从延时问题的监控及处理建议
主从延时问题的监控及处理建议
|
SQL 关系型数据库 MySQL
只读实例(slave主从)延迟排查
本文分享的方法适用于实时查看只读延迟(主从延迟),即需要在延迟发生的时候查看才能确认问题,历史延迟不适用,以下环境已经开启并行复制。
只读实例(slave主从)延迟排查
|
监控 关系型数据库 MySQL
请不要用SECONDS_BEHIND_MASTER来衡量MYSQL主备的延迟时间
MySQL 本身通过 show slave status 提供了 Seconds_Behind_Master ,用于衡量主备之间的复制延迟,但是今天碰到了一个场景,发现 Seconds_Behind_Master 为 0 , 备库的 show slave status 显示 IO/SQL 线程都是正常的 , MySQL 的主库上的变更却长时间无法同步到备库上。
1996 0
|
缓存 关系型数据库 MySQL
MySQL|主从延迟问题排查(二)
主库执行insert  select 批量写入操作,主从复制通过row模式下转换为批量的insert大事务操作,导致只读实例CPU资源以及延迟上涨
2351 0
|
SQL 监控 关系型数据库
MySQL|主从延迟问题排查(一)
大查询长时间执行无法释放DML读锁,后续同步主库的DDL操作获取DML写锁资源被阻塞等待,导致后续同步主库的操作堆积,主从延迟增长严重。从同步延迟的监控来看,延迟从17:11开始,17:51:59进行kill大查询操作,直到17:53建议业务方将大查询kill掉后才结束。
3349 0
MySQL|主从延迟问题排查(一)