TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据

2023-10-20 575

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

云数据库 RDS MySQL，集群系列 2核4GB

云原生数据库 PolarDB PostgreSQL 版，标准版 2核4GB 50GB

简介： 使用PostgreSQL数据迁移神器pgloader从TiDB迁移数据到PostgreSQL，同时说明如何在最新的Rocky Linux 9（CentOS 9 stream也适用）上通过源码编译安装pgloader。

pgloader是一款为PostgreSQL开发的数据迁移神器，支持从sqlite、MySQL、SQL Server等数据库往PostgreSQL迁移数据，也支持PostgreSQL到PostgreSQL、PostgreSQL到Citus。可自定义迁移规则，非常方便灵活。表结构、索引等都可以完整的迁移过来。

pgloader采用Lisp开发，ubuntu下可使用apt install pgloader安装，CentOS默认安装源里没有pgloader，这里以Rocky Linux 9为例，介绍如何通过源码安装。

安装Lisp

# 下载并安装最新的sbclwget https://sourceforge.net/projects/sbcl/files/sbcl/2.3.9/sbcl-2.3.9-x86-64-linux-binary.tar.bz2
tar jxvf sbcl-2.3.9-x86-64-linux-binary.tar.bz2
cd sbcl-2.3.9-x86-64-linux-binary
./install.sh
# sbcl默认会安装到/usr/local目录

编译pgloader

# dnf install freetds-develgit clone https://github.com/dimitri/pgloader.git
cd pgloader
make save
make pgloader
# 检查是否编译成功build/bin/pgloader --help# 如果编译成功，会显示以下内容pgloader [ option ... ] command-file ...
pgloader [ option ... ] SOURCE TARGET
--help-h                       boolean  Show usage and exit.
--version-V                    boolean  Displays pgloader version and exit.
--quiet-q                      boolean  Be quiet
--verbose-v                    boolean  Be verbose
--debug-d                      boolean  Display debug level information.
--client-min-messages           string   Filter logs seen at the console (default: "warning")
--log-min-messages              string   Filter logs seen in the logfile (default: "notice")
--summary-S                    string   Filename where to copy the summary
--root-dir-D                   string   Output root directory. (default: #P"/tmp/pgloader/")--upgrade-config-U             boolean  Output the command(s) corresponding to .conf file for                                           v2.x
--list-encodings-E             boolean  List pgloader known encodings and exit.
--logfile-L                    string   Filename where to send the logs.
--load-lisp-file-l             string   Read user code from files
--dry-run                       boolean  Only check database connections, don't load anything.  --on-error-stop                 boolean  Refrain from handling errors properly.  --no-ssl-cert-verification      boolean  Instruct OpenSSL to bypass verifying certificates.  --context -C                    string   Command Context Variables  --with                          string   Load options  --set                           string   PostgreSQL options  --field                         string   Source file fields specification  --cast                          string   Specific cast rules  --type                          string   Force input source type  --encoding                      string   Source expected encoding  --before                        string   SQL script to run before loading the data  --after                         string   SQL script to run after loading the data  --self-upgrade                  string   Path to pgloader newer sources  --regress                       boolean  Drive regression testing

迁移TiDB数据到PostgreSQL

如果数据量比较小，可以直接命令行，但本次迁移数据量比较大，还有几张数据量过亿的大表，接命令行做全库迁移，遇到大表，TiDB就挂了，迁移无法正常完成。无奈之下，只好对每张大表单独处理，这就需要定义load文件，load文件里设置线程数量、每次迁移数据的数量，数据类型转换等。

以下是命令行一次性迁移的例子：

pgloader --cast"type date to date drop not null drop default using zero-dates-to-null"--cast"type datetime to timestamp drop not null drop default using zero-dates-to-null"--cast"type tinyint to smallint drop typemod"--with"prefetch rows=50000"--with"workers=8"--with"concurrency=1"--with"multiple readers per thread"--with"rows per range = 50000"-v mysql://test:111111@192.168.0.1:4000/test postgresql://test:123456@192.168.0.2/test

此命令把mysql的test库迁移到postgresql中

--cast 类型转换设定，用于将mysql的某种类型转换为指定的pg类型（pgloader对MySQL的0日期处理的很好）

--with 用于指定一些迁移中的参数，比如每次迁移多少行、多少并发、多少迁移工人线程等，还可以排除一些不想迁移的表，比如本例就排除了表名中带有_log的表。

由于本次迁移生产环境的数据库，数据量较大，一次性迁移，生产库会不堪重负，因为采用了分批迁移的方式，对记录超过1000万条的表逐张迁移，其余表根据表名一部分一部分的迁移，因此需要采用编写.load文件的方式。load文件跟直接命令行相比可支持更多的指令，举例如下：

LOAD DATABASE
    FROM mysql://root@tidb_server:4000/dbname
    INTO postgresql://user:password@pg_server/dbname
WITH include drop, create tables, create indexes, reset sequences,
    workers =8, concurrency =4,
    multiple readers per thread, rows per range =50000,
    prefetch rows=50000,
    batch rows =50000, batch concurrency =1, batch size = 4GB
SET MySQL PARAMETERS
    net_read_timeout  ='14400',
    net_write_timeout ='14400',
    MAX_EXECUTION_TIME ='0'CAST type date to date drop not null drop default using zero-dates-to-null,
     type datetime to timestamp drop not null drop default using zero-dates-to-null,
     type tinyint to smallint drop typemod
INCLUDING ONLY TABLE NAMES MATCHING 'big_table1'; --仅迁移big_table1

批量迁移的例子

LOAD DATABASE
    FROM mysql://user@tidb_server:4000/dbname
    INTO postgresql://user:password@pg_server/dbname
WITH include drop, create tables, create indexes, reset sequences,
    workers =4, concurrency =2,
    multiple readers per thread, rows per range =20000,
    prefetch rows=20000,
    batch rows =20000, batch concurrency =4SET MySQL PARAMETERS
    net_read_timeout  ='14400',
    net_write_timeout ='14400',
    MAX_EXECUTION_TIME ='0'CAST type date to date drop not null drop default using zero-dates-to-null,
     type datetime to timestamp drop not null drop default using zero-dates-to-null,
     type tinyint to smallint drop typemod
INCLUDING ONLY TABLE NAMES MATCHING ~/aa_log*/; --只迁移匹配名字的表
-- 排除表的写法,排除符合条件的表
-- EXCLUDING TABLE NAMES MATCHING ~/aa_log*/,'big_table1','big_table';

写好load文件后，用如下的方法执行：

pgloader -v xxxx.load # -v参数可省略，加上可以看到执行的进度

以上就是基本的用法，掌握以上用法也基本够用了！

更详细的用法可参考pgloader官文文档

TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据

安装Lisp

编译pgloader

迁移TiDB数据到PostgreSQL

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据

安装Lisp

编译pgloader

迁移TiDB数据到PostgreSQL

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像