最近有个项目启用了mongoDB主要用作查询缓存。
需要将部分PostgreSQL中的数据导入到mongoDB中,写程序来处理的话确实是比较烦。
对应格式如下:
还好mongoDB有一个比较好的工具mongoimport可以导入格式csv , tsc , json 等格式的文件。
1. 使用csv格式导入,这里遇到了数据类型的问题,比如某字段需要导入为string类型,进去变成了数值类型。(这里的话应该是可以搞的,暂时没找到好的办法)
首先从PostgreSQL导出到文件,由于"createTime","cardName"导入到mongoDB是string类型,所以使用"括起来,结果还是很悲剧,mongoimport不认为这样会是string。凡是看起来是数字的都变成了数字.
copy (select skyid as "skyId",real_amount as "realAmount",pay_type as "payType",to_char(create_time,'yyyymmddhh24miss') as "createTime",card_type as "cardType",card_name as "cardName",app_id as "appId",amount from tbl_digoal) to '/home/postgres/to_mongo.csv' with csv HEADER quote '"' force quote "createTime","cardName";
取出一条作为范例:(下面一行范例中createTime已经使用双引号了,cardName为空)
99999999,0,1,"20101111230027",99,,999999,9999
导入的写法如下:
mongoimport -h 127.0.0.1:5281 -d 库名 -c collection名 -u 用户 -p 密码 -f skyId,realAmount,payType,createTime,cardType,cardName,appId,amount --ignoreBlanks --type csv --headerline --file ./to_mongo.csv
导入后到mongoDB中查看结果如下:
{ "_id" : ObjectId("4d26fd83f96edba6b10e3e5d"), "skyId" : NumberLong(99999999), "realAmount" : 0, "payType" : 1, "createTime" : NumberLong("20101111230027"), "cardType" : 99, "appId" : 999999, "amount" : 9999 }
很明显,createTime变成NumberLong类型了.
cardName达到预期没有出现在行中.如果不使用--ignoreBlanks,那样的话会导入"cardName" : "" 的值。
2. 使用json格式导入,由于key和输入的值都是指定的,没有类型的问题了。就是从PostgreSQL到处的文件大了3倍。
下面调整一下PostgreSQL的导出SQL:
copy (select case when skyid is null then '{"skyId" : null,' else '{"skyId" : '||skyid||',' end || case when real_amount is null then '"realAmount" : null,' else '"realAmount" : '||real_amount||',' end || case when pay_type is null then '"payType" : null,' else '"payType" : '||pay_type||',' end || case when to_char(create_time,'yyyymmddhh24miss') is null then '"createTime" : null,' else '"createTime" : "'||to_char(create_time,'yyyymmddhh24miss')||'",' end || case when card_type is null then '"cardType" : null,' else '"cardType" : '||card_type||',' end || case when card_name is null then '"cardName" : null,' else '"cardName" : "'||card_name||'",' end || case when app_id is null then '"appId" : null,' else '"appId" : '||app_id||',' end || case when amount is null then '"amount" : null}' else '"amount" : '||amount||'}' end from tbl_digoal ) to '/home/postgres/to_mongo.json'
取出一条作为范例:(下面一行范例中createTime已经使用双引号了,cardName为空)
{"skyId" : 9999999999,"realAmount" : 0,"payType" : 1,"createTime" : "20101111230027","cardType" : 99,"cardName" : null,"appId" : 9999999,"amount" : 9999}
导入的写法如下:
mongoimport -h 127.0.0.1:5281 -d 库名 -c collection名 -u 用户名 -p 密码 --type json --file ./to_mongo.json
导入后取出记录如下,和对应的类型一致.达到预期:
{ "_id" : ObjectId("4d26ff29f96edba6b10e3e60"), "skyId" : NumberLong("9999999999"), "realAmount" : 0, "payType" : 1, "createTime" : "20101111230027", "cardType" : 99, "cardName" : null, "appId" : 9999999, "amount" : 9999 }
> db.test.find({"skyId" : 9999999999})
{ "_id" : ObjectId("4d26ff29f96edba6b10e3e60"), "skyId" : NumberLong("9999999999"), "realAmount" : 0, "payType" : 1, "createTime" : "20101111230027", "cardType" : 99, "cardName" : null, "appId" : 9999999, "amount" : 9999 }
csv导入和json导入的区别:
1. csv导入的null可以直接把key忽略掉,而json的话显示为"$key" : null.当然你可以在json中删除掉"$key" : null这一节,那就和csv一样了.
2. 导入速度 没啥分别.
参考:
mongo@db-192-168-169-90-> mongoimport --help
options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
-h [ --host ] arg mongo host to connect to ("left,right" for pairs)
--port arg server port. Can also use --host hostname:port
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (some commands)
-u [ --username ] arg username
-p [ --password ] arg password
--ipv6 enable IPv6 support (disabled by default)
--dbpath arg directly access mongod database files in the given
path, instead of connecting to a mongod server -
needs to lock the data directory, so cannot be used
if a mongod is currently accessing the same path
--directoryperdb if dbpath specified, each db is in a separate
directory
-f [ --fields ] arg comma separated list of field names e.g. -f name,age
--fieldFile arg file with fields names - 1 per line
--ignoreBlanks if given, empty fields in csv and tsv will be ignored
--type arg type of file to import. default: json (json,csv,tsv)
--file arg file to import from; if not specified stdin is used
--drop drop collection first
--headerline CSV,TSV only - use first line as headers
--upsert insert or update objects that already exist
--upsertFields arg comma-separated fields for the query part of the
upsert. You should make sure this is indexed
--stopOnError stop importing at first error rather than continuing
--jsonArray load a json array, not one item per line. Currently
limited to 4MB.
需要将部分PostgreSQL中的数据导入到mongoDB中,写程序来处理的话确实是比较烦。
对应格式如下:
还好mongoDB有一个比较好的工具mongoimport可以导入格式csv , tsc , json 等格式的文件。
1. 使用csv格式导入,这里遇到了数据类型的问题,比如某字段需要导入为string类型,进去变成了数值类型。(这里的话应该是可以搞的,暂时没找到好的办法)
首先从PostgreSQL导出到文件,由于"createTime","cardName"导入到mongoDB是string类型,所以使用"括起来,结果还是很悲剧,mongoimport不认为这样会是string。凡是看起来是数字的都变成了数字.
copy (select skyid as "skyId",real_amount as "realAmount",pay_type as "payType",to_char(create_time,'yyyymmddhh24miss') as "createTime",card_type as "cardType",card_name as "cardName",app_id as "appId",amount from tbl_digoal) to '/home/postgres/to_mongo.csv' with csv HEADER quote '"' force quote "createTime","cardName";
取出一条作为范例:(下面一行范例中createTime已经使用双引号了,cardName为空)
99999999,0,1,"20101111230027",99,,999999,9999
导入的写法如下:
mongoimport -h 127.0.0.1:5281 -d 库名 -c collection名 -u 用户 -p 密码 -f skyId,realAmount,payType,createTime,cardType,cardName,appId,amount --ignoreBlanks --type csv --headerline --file ./to_mongo.csv
导入后到mongoDB中查看结果如下:
{ "_id" : ObjectId("4d26fd83f96edba6b10e3e5d"), "skyId" : NumberLong(99999999), "realAmount" : 0, "payType" : 1, "createTime" : NumberLong("20101111230027"), "cardType" : 99, "appId" : 999999, "amount" : 9999 }
很明显,createTime变成NumberLong类型了.
cardName达到预期没有出现在行中.如果不使用--ignoreBlanks,那样的话会导入"cardName" : "" 的值。
2. 使用json格式导入,由于key和输入的值都是指定的,没有类型的问题了。就是从PostgreSQL到处的文件大了3倍。
下面调整一下PostgreSQL的导出SQL:
copy (select case when skyid is null then '{"skyId" : null,' else '{"skyId" : '||skyid||',' end || case when real_amount is null then '"realAmount" : null,' else '"realAmount" : '||real_amount||',' end || case when pay_type is null then '"payType" : null,' else '"payType" : '||pay_type||',' end || case when to_char(create_time,'yyyymmddhh24miss') is null then '"createTime" : null,' else '"createTime" : "'||to_char(create_time,'yyyymmddhh24miss')||'",' end || case when card_type is null then '"cardType" : null,' else '"cardType" : '||card_type||',' end || case when card_name is null then '"cardName" : null,' else '"cardName" : "'||card_name||'",' end || case when app_id is null then '"appId" : null,' else '"appId" : '||app_id||',' end || case when amount is null then '"amount" : null}' else '"amount" : '||amount||'}' end from tbl_digoal ) to '/home/postgres/to_mongo.json'
取出一条作为范例:(下面一行范例中createTime已经使用双引号了,cardName为空)
{"skyId" : 9999999999,"realAmount" : 0,"payType" : 1,"createTime" : "20101111230027","cardType" : 99,"cardName" : null,"appId" : 9999999,"amount" : 9999}
导入的写法如下:
mongoimport -h 127.0.0.1:5281 -d 库名 -c collection名 -u 用户名 -p 密码 --type json --file ./to_mongo.json
导入后取出记录如下,和对应的类型一致.达到预期:
{ "_id" : ObjectId("4d26ff29f96edba6b10e3e60"), "skyId" : NumberLong("9999999999"), "realAmount" : 0, "payType" : 1, "createTime" : "20101111230027", "cardType" : 99, "cardName" : null, "appId" : 9999999, "amount" : 9999 }
> db.test.find({"skyId" : 9999999999})
{ "_id" : ObjectId("4d26ff29f96edba6b10e3e60"), "skyId" : NumberLong("9999999999"), "realAmount" : 0, "payType" : 1, "createTime" : "20101111230027", "cardType" : 99, "cardName" : null, "appId" : 9999999, "amount" : 9999 }
csv导入和json导入的区别:
1. csv导入的null可以直接把key忽略掉,而json的话显示为"$key" : null.当然你可以在json中删除掉"$key" : null这一节,那就和csv一样了.
2. 导入速度 没啥分别.
参考:
mongo@db-192-168-169-90-> mongoimport --help
options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times for more
verbosity e.g. -vvvvv)
-h [ --host ] arg mongo host to connect to ("left,right" for pairs)
--port arg server port. Can also use --host hostname:port
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (some commands)
-u [ --username ] arg username
-p [ --password ] arg password
--ipv6 enable IPv6 support (disabled by default)
--dbpath arg directly access mongod database files in the given
path, instead of connecting to a mongod server -
needs to lock the data directory, so cannot be used
if a mongod is currently accessing the same path
--directoryperdb if dbpath specified, each db is in a separate
directory
-f [ --fields ] arg comma separated list of field names e.g. -f name,age
--fieldFile arg file with fields names - 1 per line
--ignoreBlanks if given, empty fields in csv and tsv will be ignored
--type arg type of file to import. default: json (json,csv,tsv)
--file arg file to import from; if not specified stdin is used
--drop drop collection first
--headerline CSV,TSV only - use first line as headers
--upsert insert or update objects that already exist
--upsertFields arg comma-separated fields for the query part of the
upsert. You should make sure this is indexed
--stopOnError stop importing at first error rather than continuing
--jsonArray load a json array, not one item per line. Currently
limited to 4MB.