pgloader是一款为PostgreSQL开发的数据迁移神器,支持从sqlite、MySQL、SQL Server等数据库往PostgreSQL迁移数据,也支持PostgreSQL到PostgreSQL、PostgreSQL到Citus。可自定义迁移规则,非常方便灵活。表结构、索引等都可以完整的迁移过来。
pgloader采用Lisp开发,ubuntu下可使用apt install pgloader安装,CentOS默认安装源里没有pgloader,这里以Rocky Linux 9为例,介绍如何通过源码安装。
安装Lisp
# 下载并安装最新的sbclwget https://sourceforge.net/projects/sbcl/files/sbcl/2.3.9/sbcl-2.3.9-x86-64-linux-binary.tar.bz2 tar jxvf sbcl-2.3.9-x86-64-linux-binary.tar.bz2 cd sbcl-2.3.9-x86-64-linux-binary ./install.sh # sbcl默认会安装到/usr/local目录
编译pgloader
# dnf install freetds-develgit clone https://github.com/dimitri/pgloader.git cd pgloader make save make pgloader # 检查是否编译成功build/bin/pgloader --help# 如果编译成功,会显示以下内容pgloader [ option ... ] command-file ... pgloader [ option ... ] SOURCE TARGET --help-h boolean Show usage and exit. --version-V boolean Displays pgloader version and exit. --quiet-q boolean Be quiet --verbose-v boolean Be verbose --debug-d boolean Display debug level information. --client-min-messages string Filter logs seen at the console (default: "warning") --log-min-messages string Filter logs seen in the logfile (default: "notice") --summary-S string Filename where to copy the summary --root-dir-D string Output root directory. (default: #P"/tmp/pgloader/")--upgrade-config-U boolean Output the command(s) corresponding to .conf file for v2.x --list-encodings-E boolean List pgloader known encodings and exit. --logfile-L string Filename where to send the logs. --load-lisp-file-l string Read user code from files --dry-run boolean Only check database connections, don't load anything. --on-error-stop boolean Refrain from handling errors properly. --no-ssl-cert-verification boolean Instruct OpenSSL to bypass verifying certificates. --context -C string Command Context Variables --with string Load options --set string PostgreSQL options --field string Source file fields specification --cast string Specific cast rules --type string Force input source type --encoding string Source expected encoding --before string SQL script to run before loading the data --after string SQL script to run after loading the data --self-upgrade string Path to pgloader newer sources --regress boolean Drive regression testing
迁移TiDB数据到PostgreSQL
如果数据量比较小,可以直接命令行,但本次迁移数据量比较大,还有几张数据量过亿的大表,接命令行做全库迁移,遇到大表,TiDB就挂了,迁移无法正常完成。无奈之下,只好对每张大表单独处理,这就需要定义load文件,load文件里设置线程数量、每次迁移数据的数量,数据类型转换等。
以下是命令行一次性迁移的例子:
pgloader --cast"type date to date drop not null drop default using zero-dates-to-null"--cast"type datetime to timestamp drop not null drop default using zero-dates-to-null"--cast"type tinyint to smallint drop typemod"--with"prefetch rows=50000"--with"workers=8"--with"concurrency=1"--with"multiple readers per thread"--with"rows per range = 50000"-v mysql://test:111111@192.168.0.1:4000/test postgresql://test:123456@192.168.0.2/test
此命令把mysql的test库迁移到postgresql中
--cast 类型转换设定,用于将mysql的某种类型转换为指定的pg类型(pgloader对MySQL的0日期处理的很好)
--with 用于指定一些迁移中的参数,比如每次迁移多少行、多少并发、多少迁移工人线程等,还可以排除一些不想迁移的表,比如本例就排除了表名中带有_log的表。
由于本次迁移生产环境的数据库,数据量较大,一次性迁移,生产库会不堪重负,因为采用了分批迁移的方式,对记录超过1000万条的表逐张迁移,其余表根据表名一部分一部分的迁移,因此需要采用编写.load文件的方式。load文件跟直接命令行相比可支持更多的指令,举例如下:
LOAD DATABASE FROM mysql://root@tidb_server:4000/dbname INTO postgresql://user:password@pg_server/dbname WITH include drop, create tables, create indexes, reset sequences, workers =8, concurrency =4, multiple readers per thread, rows per range =50000, prefetch rows=50000, batch rows =50000, batch concurrency =1, batch size = 4GB SET MySQL PARAMETERS net_read_timeout ='14400', net_write_timeout ='14400', MAX_EXECUTION_TIME ='0'CAST type date to date drop not null drop default using zero-dates-to-null, type datetime to timestamp drop not null drop default using zero-dates-to-null, type tinyint to smallint drop typemod INCLUDING ONLY TABLE NAMES MATCHING 'big_table1'; --仅迁移big_table1
批量迁移的例子
LOAD DATABASE FROM mysql://user@tidb_server:4000/dbname INTO postgresql://user:password@pg_server/dbname WITH include drop, create tables, create indexes, reset sequences, workers =4, concurrency =2, multiple readers per thread, rows per range =20000, prefetch rows=20000, batch rows =20000, batch concurrency =4SET MySQL PARAMETERS net_read_timeout ='14400', net_write_timeout ='14400', MAX_EXECUTION_TIME ='0'CAST type date to date drop not null drop default using zero-dates-to-null, type datetime to timestamp drop not null drop default using zero-dates-to-null, type tinyint to smallint drop typemod INCLUDING ONLY TABLE NAMES MATCHING ~/aa_log*/; --只迁移匹配名字的表 -- 排除表的写法,排除符合条件的表 -- EXCLUDING TABLE NAMES MATCHING ~/aa_log*/,'big_table1','big_table';
写好load文件后,用如下的方法执行:
pgloader -v xxxx.load # -v参数可省略,加上可以看到执行的进度
以上就是基本的用法,掌握以上用法也基本够用了!