利用pt-table-sync 解决主备数据不一致的问题-阿里云开发者社区

利用pt-table-sync 解决主备数据不一致的问题

2017-12-09 6339

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

https://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html

提醒

在使用之前备份将要操作的数据表
使用 --replicate or --sync-to-master方法，是在主库做修改，而不是直接修改备库

同步语法

-- Sync db.tbl on host1 to host2:
pt-table-sync --execute h=host1,D=db,t=tbl h=host2

-- Sync all tables on host1 to host2 and host3:
pt-table-sync --execute host1 host2 host3

-- Make slave1 have the same data as its replication master:
pt-table-sync --execute --sync-to-master slave1 

# Resolve differences that pt-table-checksum found on all slaves of master1:
pt-table-sync --execute --replicate test.checksum master1

# Same as above but only resolve differences on slave1:
pt-table-sync --execute --replicate test.checksum --sync-to-master slave1

# Sync master2 in a master-master replication configuration, where master2’s copy of db.tbl is known or suspected to be incorrect:
pt-table-sync --execute --sync-to-master h=master2,D=db,t=tbl

# Note that in the master-master configuration, the following will NOT do what you want, because it will make changes directly on master2, which will then flow through replication and change master1’s data:
#! Don't do this in a master-master setup!
pt-table-sync --execute h=master1,D=db,t=tbl master2

# 有主键或者唯一键，在主库进行 replace into 的操作
pt-table-sync --execute h=192.168.3.26,u=root,p=zhujie1986,D=working,t=department,P=3306 --sync-to-master --verbose --verbose --charset=utf8 --print

# 没主键或唯一键，直接在备库操作，要有超级用户权限
pt-table-sync --execute h=192.168.3.25,u=root,p=zhujie1986,D=working,t=department,P=3306 h=192.168.3.26 --no-check-slave --verbose --verbose --charset=utf8 --print

风险

FBI WARNING: pt-table-sync changes data! Before using this tool, please:

Read the tool’s documentation
Review the tool’s known “BUGS”
Test the tool on a non-production server
Backup your production server and verify the backups
pt-table-sync is mature, proven in the real world, and well tested, but if used improperly it can have adverse consequences. Always test syncing first with --dry-run and --print.

功能点

使用单向和双向同步数据
并不会同步表结构、索引或者其他对象

针对单向数据同步

--replicate的目的
找出不同

匹配主库

if DSN has a t part, sync only that table:
   if 1 DSN:
      if --sync-to-master:
         The DSN is a slave.  Connect to its master and sync.
   if more than 1 DSN:
      The first DSN is the source.  Sync each DSN in turn.
else if --replicate:
   if --sync-to-master:
      The DSN is a slave.  Connect to its master, find records
      of differences, and fix.
   else:
      The DSN is the master.  Find slaves and connect to each,
      find records of differences, and fix.
else:
   if only 1 DSN and --sync-to-master:
      The DSN is a slave.  Connect to its master, find tables and
      filter with --databases etc, and sync each table to the master.
   else:
      find tables, filtering with --databases etc, and sync each
      DSN to the first.

pt-table-sync默认不使用 --replicate参数，程序内部找出表数据的差异并修复差异
如果启用，pt-table-sync会读取 pt-table-checksum已经验证出的差异信息
必须指定需要同步的数据库信息：
- --sync-to-master，后面跟备库的信息；程序运行过程中自动发现并连接主库
- 检测到差异，在主库上做修改；通过复制，同步到备库
- 如果是一主多重的环境，那么所有备库都会同步更新
- 如果不指定 --sync-to-master，那么必须指定至少两个 DSN配置，最前一个作为主库，后一个作为备库
- 如果配置为主库的信息实际上是备库，那么进程将停止运行，因为备库不可写；
如果使用了 --replicate但是没有使用 --sync-to-master，那么只需要一个主库的DSN配置；程序会自动发现所有的备库，并且同时修复差异的数据表
以 DSN的形式配置的第一个数据库，其后的 DSN配置会使用第一个的参数资源，比如

pt-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2

host2 将会使用 host1的 u,p参数连接数据库

限制

Replicas using row-based replication

pt-table-sync requires statement-based replication when used with the --sync-to-master or --replicate option. Therefore it will set binlog_format=STATEMENT on the master for its session if required. To do this user must have SUPER privilege.

输出 --verbose --print --charset=utf8

pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --replicate=percona.checksums --set-vars innodb_lock_wait_timeout=50 --host=192.168.3.25 --port=3306 --user=root --password=zhujie1986 --databases working --tables department --replicate-check
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
01-18T14:58:11      0      1        7       1       0   0.009 working.department

pt-table-sync --execute h=192.168.3.25,u=root,p=zhujie1986,D=working,t=department,P=3306 h=192.168.3.26 --no-check-slave --verbose --charset=utf8
# Syncing A=utf8,D=working,P=3306,h=192.168.3.26,p=...,t=department,u=root
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      7      0 GroupBy   14:59:28 14:59:28 2    working.department

同步处理流程

在主备表结构相同，且存在唯一索引或主键的情况下，优先使用 INSERT UPDATE DELETE 操作数据，解决数据差异问题
在主备表结构不同，但是主库表存在主键，备库表存在唯一索引的情况下，将会使用 DELETE REPLACE 修复数据

可选参数

- --verbose：输出差异数据处理信息，--verbose --verbose 输出块信息
- --print：输出处理 SQL语句
- --charset=utf8：设置编码，主要针对插入
- --no-check-slave：直接在备库插入，需要超级用户权限

算法

使用不同的算法来验证数据差异
根据索引、字段类型以及 --algorithms参数指定的值来选择最优的算法
Chunk
- 第一个字段是数字类型（date/time）的索引，并根据 --chunk-size的值设置 chunk大小和个数
- 每次验证一个块，整个块作为一个整体算出一个值
- 如果取得的块值不相同，那么单独验证这个块的数据
- 每个块相对来说都是很小的，小号的系统资源、带宽等可以忽略不计
- 验证块数据的时候，只有主键和算法值会通过网络传输，一边验证
- 验证结果有差异，才会传输整个块的行记录

利用pt-table-sync 解决主备数据不一致的问题

提醒

同步语法

风险

功能点

限制

输出 --verbose --print --charset=utf8

同步处理流程

可选参数

算法

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

利用pt-table-sync 解决主备数据不一致的问题

提醒

同步语法

风险

功能点

限制

输出 --verbose --print --charset=utf8

同步处理流程

可选参数

算法

热门文章

最新文章

相关电子书