TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据

2023-10-20 1086

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 使用PostgreSQL数据迁移神器pgloader从TiDB迁移数据到PostgreSQL，同时说明如何在最新的Rocky Linux 9（CentOS 9 stream也适用）上通过源码编译安装pgloader。

pgloader是一款为PostgreSQL开发的数据迁移神器，支持从sqlite、MySQL、SQL Server等数据库往PostgreSQL迁移数据，也支持PostgreSQL到PostgreSQL、PostgreSQL到Citus。可自定义迁移规则，非常方便灵活。表结构、索引等都可以完整的迁移过来。

pgloader采用Lisp开发，ubuntu下可使用apt install pgloader安装，CentOS默认安装源里没有pgloader，这里以Rocky Linux 9为例，介绍如何通过源码安装。

安装Lisp

# 下载并安装最新的sbclwget https://sourceforge.net/projects/sbcl/files/sbcl/2.3.9/sbcl-2.3.9-x86-64-linux-binary.tar.bz2
tar jxvf sbcl-2.3.9-x86-64-linux-binary.tar.bz2
cd sbcl-2.3.9-x86-64-linux-binary
./install.sh
# sbcl默认会安装到/usr/local目录

编译pgloader

# dnf install freetds-develgit clone https://github.com/dimitri/pgloader.git
cd pgloader
make save
make pgloader
# 检查是否编译成功build/bin/pgloader --help# 如果编译成功，会显示以下内容pgloader [ option ... ] command-file ...
pgloader [ option ... ] SOURCE TARGET
--help-h                       boolean  Show usage and exit.
--version-V                    boolean  Displays pgloader version and exit.
--quiet-q                      boolean  Be quiet
--verbose-v                    boolean  Be verbose
--debug-d                      boolean  Display debug level information.
--client-min-messages           string   Filter logs seen at the console (default: "warning")
--log-min-messages              string   Filter logs seen in the logfile (default: "notice")
--summary-S                    string   Filename where to copy the summary
--root-dir-D                   string   Output root directory. (default: #P"/tmp/pgloader/")--upgrade-config-U             boolean  Output the command(s) corresponding to .conf file for                                           v2.x
--list-encodings-E             boolean  List pgloader known encodings and exit.
--logfile-L                    string   Filename where to send the logs.
--load-lisp-file-l             string   Read user code from files
--dry-run                       boolean  Only check database connections, don't load anything.  --on-error-stop                 boolean  Refrain from handling errors properly.  --no-ssl-cert-verification      boolean  Instruct OpenSSL to bypass verifying certificates.  --context -C                    string   Command Context Variables  --with                          string   Load options  --set                           string   PostgreSQL options  --field                         string   Source file fields specification  --cast                          string   Specific cast rules  --type                          string   Force input source type  --encoding                      string   Source expected encoding  --before                        string   SQL script to run before loading the data  --after                         string   SQL script to run after loading the data  --self-upgrade                  string   Path to pgloader newer sources  --regress                       boolean  Drive regression testing

迁移TiDB数据到PostgreSQL

如果数据量比较小，可以直接命令行，但本次迁移数据量比较大，还有几张数据量过亿的大表，接命令行做全库迁移，遇到大表，TiDB就挂了，迁移无法正常完成。无奈之下，只好对每张大表单独处理，这就需要定义load文件，load文件里设置线程数量、每次迁移数据的数量，数据类型转换等。

以下是命令行一次性迁移的例子：

pgloader --cast"type date to date drop not null drop default using zero-dates-to-null"--cast"type datetime to timestamp drop not null drop default using zero-dates-to-null"--cast"type tinyint to smallint drop typemod"--with"prefetch rows=50000"--with"workers=8"--with"concurrency=1"--with"multiple readers per thread"--with"rows per range = 50000"-v mysql://test:111111@192.168.0.1:4000/test postgresql://test:123456@192.168.0.2/test

此命令把mysql的test库迁移到postgresql中

--cast 类型转换设定，用于将mysql的某种类型转换为指定的pg类型（pgloader对MySQL的0日期处理的很好）

--with 用于指定一些迁移中的参数，比如每次迁移多少行、多少并发、多少迁移工人线程等，还可以排除一些不想迁移的表，比如本例就排除了表名中带有_log的表。

由于本次迁移生产环境的数据库，数据量较大，一次性迁移，生产库会不堪重负，因为采用了分批迁移的方式，对记录超过1000万条的表逐张迁移，其余表根据表名一部分一部分的迁移，因此需要采用编写.load文件的方式。load文件跟直接命令行相比可支持更多的指令，举例如下：

LOAD DATABASE
    FROM mysql://root@tidb_server:4000/dbname
    INTO postgresql://user:password@pg_server/dbname
WITH include drop, create tables, create indexes, reset sequences,
    workers =8, concurrency =4,
    multiple readers per thread, rows per range =50000,
    prefetch rows=50000,
    batch rows =50000, batch concurrency =1, batch size = 4GB
SET MySQL PARAMETERS
    net_read_timeout  ='14400',
    net_write_timeout ='14400',
    MAX_EXECUTION_TIME ='0'CAST type date to date drop not null drop default using zero-dates-to-null,
     type datetime to timestamp drop not null drop default using zero-dates-to-null,
     type tinyint to smallint drop typemod
INCLUDING ONLY TABLE NAMES MATCHING 'big_table1'; --仅迁移big_table1

批量迁移的例子

LOAD DATABASE
    FROM mysql://user@tidb_server:4000/dbname
    INTO postgresql://user:password@pg_server/dbname
WITH include drop, create tables, create indexes, reset sequences,
    workers =4, concurrency =2,
    multiple readers per thread, rows per range =20000,
    prefetch rows=20000,
    batch rows =20000, batch concurrency =4SET MySQL PARAMETERS
    net_read_timeout  ='14400',
    net_write_timeout ='14400',
    MAX_EXECUTION_TIME ='0'CAST type date to date drop not null drop default using zero-dates-to-null,
     type datetime to timestamp drop not null drop default using zero-dates-to-null,
     type tinyint to smallint drop typemod
INCLUDING ONLY TABLE NAMES MATCHING ~/aa_log*/; --只迁移匹配名字的表
-- 排除表的写法,排除符合条件的表
-- EXCLUDING TABLE NAMES MATCHING ~/aa_log*/,'big_table1','big_table';

写好load文件后，用如下的方法执行：

pgloader -v xxxx.load # -v参数可省略，加上可以看到执行的进度

以上就是基本的用法，掌握以上用法也基本够用了！

更详细的用法可参考pgloader官文文档

TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据

安装Lisp

编译pgloader

迁移TiDB数据到PostgreSQL

热门文章

最新文章

相关课程

相关电子书

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

TiDB实时同步数据到PostgreSQL(三) ---- 使用pgloader迁移数据

安装Lisp

编译pgloader

迁移TiDB数据到PostgreSQL

热门文章

最新文章

相关课程

相关电子书

推荐镜像