前面一篇简单的对比了一下PostgreSQL和MonetDB在bulk load和简单的统计查询的性能.
因为测试数据比较单一, 可能没有什么说服力.
本文测试环境与之前的一致, 但是使用宽表, 随机离散字符串进行测试. 5000万测试数据, 60个字段, 单表, 容量149GB, (PG96GB)
[注意]
PostgreSQL注重的是高并发, 而MonetDB注重数据分析, 所以两者没有什么可比性, 这里拿MonetDB的强项来VS PostgreSQL的软肋, 主要是给一些在使用PostgreSQL做数据分析并且遇到瓶颈的朋友一些启发, 换个产品试试, 不要使用一个产品的软肋(这里指PostgreSQL在统计方面目前略差, 未来9.4+版本出来单SQL可以用到多核再来谈统计效率)
PostgreSQL支持的列存储插件cstore_fdw性能, 如果要玩的话可以
参考 :
对于定长类型, 可以提升较高的性能.
[测试结果]
SQL / 时间(毫秒) |
MonetDB | PostgreSQL |
select count(*) from b; 结果 : 50000000 |
0.8 | 158000 |
select count(distinct c1) from b; 结果 : 50000000 |
63000 | 212000 |
select count(distinct c2) from b; 结果 : 49422662 |
201000 | 436000 |
select count(distinct c60) from b; 结果 : 49423470 |
204000 | 476000 |
select count(*) from (select c1,c11,c21,c31,c41,c51,c60 from b group by c1,c11,c21,c31,c41,c51,c60) as t; 结果 : 50000000 |
54300 | 364000 |
select count(*) from (select c60 from b group by c1,c11,c21,c31,c41,c51,c60) as t; 结果 : 50000000 |
54100 | 367000 |
select count(c1) from b where c11=c2 or c2=c3 or c4=c5 or c6=c7 or c8=c9; 结果 : 1 |
1900 | 186000 |
select count(*) from (select c2,c11,c21,c31,c41,c51,c60 from b group by c2,c11,c21,c31,c41,c51,c60) as t; 结果 : 50000000 不带C1条件, 非常慢. 而且MonetDB会消耗大量的磁盘空间, 这里消耗了1TB空间. PG不会消耗额外的大量空间. |
621000 | 835079 |
|
[测试详情]
本文测试数据变一下, 使用宽表进行测试. 60个字段的宽表.
(包含1个id, int类型, primary key,
其他59个字段为变长字符串.)
字符串的数据使用PostgreSQL产生, md5(random()::text), 每行, 每个字段的值都完全随机, .
为了对比导入速度, 先使用PostgreSQL生成文件, 再把文件导入MonetDB和PostgreSQL表.
导入MonetDB可以直接导入或者使用管道例如 :
postgres@150-> cat 1.sql | mclient test -s "copy 3125000 records into b from stdin" -
3125000 affected row
postgres@150-> mclient test -s "copy 3125000 records into b from '/data01/postgres/2.sql'"
3125000 affected row
PostgreSQL 使用16个并行导出数据. 注意修改起始值和文件名. 同时还要注意oom的问题, 需要调整一下内核参数.
generate_series(1到16,50000000,16)
/data01/postgres/1到16.sql
数据导出SQL如下 :
导出后文本占用97GB.
查看几条测试数据, 非常随机 :
MonetDB建表.
配置mclient配置文件, 这样的话mclient无需输入密码.
导入数据到MonetDB, 注意串行导入, 因为单表有写冲突.
psql -c 'copy (
select generate_series(1,50000000,16) as c1,
md5(random()::text) as c2,
md5(random()::text) as c3,
md5(random()::text) as c4,
md5(random()::text) as c5,
md5(random()::text) as c6,
md5(random()::text) as c7,
md5(random()::text) as c8,
md5(random()::text) as c9,
md5(random()::text) as c10,
md5(random()::text) as c11,
md5(random()::text) as c12,
md5(random()::text) as c13,
md5(random()::text) as c14,
md5(random()::text) as c15,
md5(random()::text) as c16,
md5(random()::text) as c17,
md5(random()::text) as c18,
md5(random()::text) as c19,
md5(random()::text) as c20,
md5(random()::text) as c21,
md5(random()::text) as c22,
md5(random()::text) as c23,
md5(random()::text) as c24,
md5(random()::text) as c25,
md5(random()::text) as c26,
md5(random()::text) as c27,
md5(random()::text) as c28,
md5(random()::text) as c29,
md5(random()::text) as c30,
md5(random()::text) as c31,
md5(random()::text) as c32,
md5(random()::text) as c33,
md5(random()::text) as c34,
md5(random()::text) as c35,
md5(random()::text) as c36,
md5(random()::text) as c37,
md5(random()::text) as c38,
md5(random()::text) as c39,
md5(random()::text) as c40,
md5(random()::text) as c41,
md5(random()::text) as c42,
md5(random()::text) as c43,
md5(random()::text) as c44,
md5(random()::text) as c45,
md5(random()::text) as c46,
md5(random()::text) as c47,
md5(random()::text) as c48,
md5(random()::text) as c49,
md5(random()::text) as c50,
md5(random()::text) as c51,
md5(random()::text) as c52,
md5(random()::text) as c53,
md5(random()::text) as c54,
md5(random()::text) as c55,
md5(random()::text) as c56,
md5(random()::text) as c57,
md5(random()::text) as c58,
md5(random()::text) as c59,
md5(random()::text) as c60
) to $$/data01/postgres/1.sql$$ with
(format csv,
delimiter $$|$$,
quote $$"$$,
force_quote (c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18,c19,c20,c21,c22,c23,c24,c25,c26,c27,c28,c29,c30,c31,c32,c33,c34,c35,c36,c37,c38,c39,c40,c41,c42,c43,c44,c45,c46,c47,c48,c49,c50,c51,c52,c53,c54,c55,c56,c57,c58,c59,c60)
)';
导出后文本占用97GB.
[root@150 postgres]# ll -h
total 97G
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:16 10.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:24 11.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:32 12.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:40 13.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:48 14.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:56 15.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 15:03 16.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 12:56 1.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 13:07 2.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 13:15 3.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 13:22 4.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 13:31 5.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 13:43 6.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 13:51 7.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:00 8.sql
-rw-r--r-- 1 postgres postgres 6.1G Aug 14 14:08 9.sql
查看几条测试数据, 非常随机 :
1|"6b495127ec426c60e1e4f66bea29e5f0"|"020909484fbefd11c059330bef0f44b8"|"c59d029b1ad4b0c29d571bbacc338041"|"f1e6753f5ac3cfd886290aa1756233e6"|"f2d2f3d3ee6d08dadfe56e71a4857b12"|"fc11f28cdbc474d32a5321b5412cd4e5"|"b431c19a58520812195a3217920e7ea6"|"7c1bbb0477af3e90a225187f7d6fb466"|"a1d5c497ff82e4a22805ab00c60521dd"|"c0be2511e742fca12c9929b46e5d03f4"|"c6ddb35898321dd0f4f475d27d70bdfa"|"5e0f8a2b8040c5edb9e30c349ea1aa08"|"542f55776366b5efb210936e07679425"|"dc4a1ee09d065d0dfbdcd7e3457f38e7"|"6896190252998bbf74cd8087ed041afd"|"50e87880f9dfff89c82ba75880fb6b71"|"218390986173ff335e1b9a520b4f68a1"|"a68cfa103b15e47966e663c493450e4c"|"83d0410bc9a645bef166a18dcb155013"|"d86683995fd2e95204b4427ebdeefa63"|"3d0d5a1a2d1aa296c015a7aa97a1ce58"|"956fa78d1dde685206a5a0aa4298542e"|"4f3e337d5ffec164f60c1d572251387e"|"e683d90ca4f3cf5233dfbebe6e0c757e"|"bdaa103929668f49944f66a1fa23c796"|"a83512b2b815bcf3492c1b905555961f"|"3f0baa432c7d47b6e93353c3fe4267a8"|"2457998f3911bdf52a4afe553ce133ea"|"70ce2a65fb6c5a66eca586e8072f28c3"|"14ce930d4f9706a00b03fa97ab031405"|"90c314b2b184544c7110bd8a2b09815c"|"f2423b95d694a0f85712698f77e1944f"|"15289cf134a064c4a749ff1feff81125"|"f8eaf6f552f1e4f8f64c02f32491fe73"|"fc2fa0b686894d16010ecc0221510591"|"f26f7546e35a3ffae57917090c28e42f"|"beea5ff050e1ce61d10c6ef611cc02ff"|"f7a33420d831f84bbb7782ffbaa9efed"|"15f12088a0933a620432285412af7823"|"f7d3f813f972f05e7e5d38cf831be116"|"7a6f1694407e2c614feda3b1a49f352f"|"6b1378fcf263defa65394373df8e1632"|"3b7a40e3631e64802108952127744134"|"f5a1af63d049900810eb4e5bdc496ff5"|"b44e575ea3d247c2f3f4e20189aadf6d"|"e2563f6b379d49e6755074b874b7b27d"|"81cda978c87c9382a4d5d1c38a49819c"|"545ec147915f0947cdebb89ee3f591c8"|"db8eb48bedfdf9d809f785b10444d3c9"|"4a90dd3dbebae82ab7f14faafb9a3bc0"|"e6127534fc4a580cf966925d2132b698"|"ec3a516fa63dde62a309512d96dc4c3f"|"c2f9b9d4b0527fdf1be73c6bc4a5f31f"|"9ae6840fb9ae4b8f379db0ae8c840df2"|"59c588434d922564da7fc122adda9d0f"|"8cc6b404dc03da5a081ab35cb3e2faf5"|"879422d0633783ff976a244b36c41564"|"2dce0ce871aeae678ad48131e2d0ad64"|"9898840d1c85a22292767aef319c679a"
......
2049|"18b8893c2c8f362d1042fadae097debf"|"277da6dc014f982f4b190c56d4b13b61"|"2bb1d5ce110976c5dd1e926890586ca0"|"469785c1c793cd650d9590a2425ff336"|"a36945aaf730e919d462c787dc527b68"|"3e1b11d48f853ce8f44cd3a57b228898"|"2ffc6b2ae55ffd1d9eedbc597e41d41e"|"9624e61cbcb8e3d4d226632a87c575ef"|"9563918b5e10685e5862f429d85a418e"|"747b0bbdbf560a787313f172d37d4ec7"|"a5a3f74debda6f2c1a3fd1dfb956e619"|"77a88ab046ecbfdce17a1b2e9dc92aed"|"e22929b234342cb6606626f61887d97a"|"c0ee045410d79fbae1c465f7c71d37df"|"808a16397cf3e87bdabb4cdd3bbc1a84"|"803dcbddb1f5adb934767f0b60dd179a"|"ff360cd12228e8d2b25a997e6513894f"|"86796d54cdaf1ce6385162da9c048591"|"aae527a8dc3d518c033992e8313b8275"|"6a421d136c399288459c9bd9fc8af22e"|"a3b3b767a4a5645eff4f19da0e41f8dd"|"3666f5c0165ab00d43218fec88aebe00"|"c7ee020a939464bc5e8ed8c034c9f332"|"45a221fc08213d006d2a16d1bb41db6a"|"d8e3dd02893590180c94c1aec6a76895"|"31ef9b785f08b0c2f44d30a4926e1c36"|"a0c998be919166f5edad03f7a73eca11"|"bc520372d8c5e7b6d10970aa825eb055"|"d19cd0d11bf28a0c38b7f617d90891a8"|"74ef2eefe33754a61b1b607b460bcc84"|"e84286489ea2daed35003457c24b33dc"|"81e390de812b57914c67755df5b6343a"|"aaa35f9ea00ceadd0b72ccd3e8e37f57"|"8e7893dc751c78b95905fb0f026cc021"|"01a921258a4ed6a980ae51665970350a"|"06a40933329dcfb0e0e02b277b784c0b"|"f0f6a680123b8e149cc036bfcecb0aee"|"bc3585c35d28f02324a8b3e886585b36"|"98c07805a5c9d246b4174e7ff2a4b082"|"1b92fffe358086cb291da31a739deefe"|"6dad26a67438b01fa2634ac5a39f3448"|"3dc0211ebcffc91d05b1591014d07e0a"|"7132009424ef332e13e56b982e920118"|"f9b2c9eec01b9c140fd0f5aa6b22f4a9"|"d094ae011b95232249956666484ee6c0"|"610c85e1bf9f8c1ec831b2d2fb6dd352"|"121f6c2f749bef14eeb9e46c75cc3587"|"29c55d993def74d7a6fd7f2f88648552"|"c203b1ac0069426a0113fa8a56ec2e4b"|"fed702df88b6b26a38cca158f184d6bd"|"2c5475df4496fd9b601178e974268d58"|"e0e0723c44a8c043b28170c9dbfa371f"|"5d872da32364aeaebef93c41a7f48f9e"|"8ffb1c390c8377399fa3bf668c78f052"|"077c694164ea161fb512b8f958fc5df9"|"36156cfe1412ccca8cb97749bd3ec2c7"|"b24380748099410545f7f2ccaefb9d22"|"7ec098121bf5db976e7567efb4caac25"|"fd439abb98d850f60434ef9a3fd4393a"
MonetDB建表.
create table b(c1 int primary key,
c2 varchar(64),
c3 varchar(64),
c4 varchar(64),
c5 varchar(64),
c6 varchar(64),
c7 varchar(64),
c8 varchar(64),
c9 varchar(64),
c10 varchar(64),
c11 varchar(64),
c12 varchar(64),
c13 varchar(64),
c14 varchar(64),
c15 varchar(64),
c16 varchar(64),
c17 varchar(64),
c18 varchar(64),
c19 varchar(64),
c20 varchar(64),
c21 varchar(64),
c22 varchar(64),
c23 varchar(64),
c24 varchar(64),
c25 varchar(64),
c26 varchar(64),
c27 varchar(64),
c28 varchar(64),
c29 varchar(64),
c30 varchar(64),
c31 varchar(64),
c32 varchar(64),
c33 varchar(64),
c34 varchar(64),
c35 varchar(64),
c36 varchar(64),
c37 varchar(64),
c38 varchar(64),
c39 varchar(64),
c40 varchar(64),
c41 varchar(64),
c42 varchar(64),
c43 varchar(64),
c44 varchar(64),
c45 varchar(64),
c46 varchar(64),
c47 varchar(64),
c48 varchar(64),
c49 varchar(64),
c50 varchar(64),
c51 varchar(64),
c52 varchar(64),
c53 varchar(64),
c54 varchar(64),
c55 varchar(64),
c56 varchar(64),
c57 varchar(64),
c58 varchar(64),
c59 varchar(64),
c60 varchar(64));
配置mclient配置文件, 这样的话mclient无需输入密码.
man mclient
Before mclient starts parsing command line options, it reads a .monetdb file. If the environment variable DOT-
MONETDBFILE is set, it reads the file pointed to by that variable instead. When unset, mclient searches for a
.monetdb file in the current working directory, and if that doesn’t exist, in the current user’s home direc-
tory. This file can contain defaults for the flags user, password, language, save_history, format, and width.
For example, an entry in a .monetdb file that sets the default language for mclient to mal looks like this:
language=mal. To disable reading the .monetdb file, set the variable DOTMONETDBFILE to the empty string in the
environment.
[root@150 ~]# vi ~/.monetdb
user=monetdb
password=monetdb
[root@150 ~]# chmod 400 ~/.monetdb
导入数据到MonetDB, 注意串行导入, 因为单表有写冲突.
参考
导入耗时约3703秒.
存储空间, 主键200M, 字符串2.5G, 一共149G左右. 比PostgreSQL存储更占空间.
[测试]
mclient test -s "copy 3125000 records into b from '/data01/postgres/1.sql';"
...
mclient test -s "copy 3125000 records into b from '/data01/postgres/16.sql';"
存储空间, 主键200M, 字符串2.5G, 一共149G左右. 比PostgreSQL存储更占空间.
sql>select * from "sys"."storage";
| sys | b | c1 | int | 31/3114 | 50000000 | 4 | 200000000 | 0 | 0 | false |
| sys | b | c2 | varchar | 03/331 | 50000000 | 34 | 200000000 | 2400321536 | 468451328 | false |
| sys | b | c3 | varchar | 17/1720 | 50000000 | 34 | 200000000 | 2400321536 | 468451328 | false |
| sys | b | c4 | varchar | 05/545 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c5 | varchar | 6 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c6 | varchar | 30/3065 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c7 | varchar | 17/1702 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c8 | varchar | 31/3113 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c9 | varchar | 05/510 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c10 | varchar | 31/3115 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c11 | varchar | 31/3111 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c12 | varchar | 13/1314 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c13 | varchar | 31/3112 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c14 | varchar | 30/3064 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c15 | varchar | 31/3110 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c16 | varchar | 31/3107 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c17 | varchar | 31/3106 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c18 | varchar | 15/1575 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c19 | varchar | 31/3105 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c20 | varchar | 31/3104 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c21 | varchar | 31/3103 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c22 | varchar | 30/3056 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c23 | varchar | 31/3102 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c24 | varchar | 14/1453 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c25 | varchar | 30/3055 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c26 | varchar | 11/1114 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c27 | varchar | 31/3100 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c28 | varchar | 31/3101 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c29 | varchar | 13/1335 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c30 | varchar | 12/1275 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c31 | varchar | 30/3067 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c32 | varchar | 14/1433 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c33 | varchar | 30/3063 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c34 | varchar | 30/3077 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c35 | varchar | 12/1233 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c36 | varchar | 30/3045 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c37 | varchar | 56 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c38 | varchar | 15/1552 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c39 | varchar | 30/3050 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c40 | varchar | 30/3076 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c41 | varchar | 12/1215 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c42 | varchar | 30/3075 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c43 | varchar | 11/1133 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c44 | varchar | 15/1524 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c45 | varchar | 30/3073 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c46 | varchar | 30/3074 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c47 | varchar | 13/1317 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c48 | varchar | 30/3057 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c49 | varchar | 04/454 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c50 | varchar | 30/3066 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c51 | varchar | 07/752 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c52 | varchar | 10/1007 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c53 | varchar | 30/3072 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c54 | varchar | 04/413 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c55 | varchar | 30/3071 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c56 | varchar | 06/667 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c57 | varchar | 30/3070 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c58 | varchar | 30/3012 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c59 | varchar | 20/2002 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
| sys | b | c60 | varchar | 05/563 | 50000000 | 34 | 200000000 | 2400321536 | 0 | false |
[测试]
查询
sql>select count(*) from b;
+----------+
| L1 |
+==========+
| 50000000 |
+----------+
1 tuple (0.829ms) -- PostgreSQL 158秒
sql>select count(distinct c1) from b;
+----------+
| L1 |
+==========+
| 50000000 |
+----------+
1 tuple (1m 3s) -- PostgreSQL 212秒
sql>select count(distinct c2) from b; -- 字符串唯一值处理速度比INT较慢.
+----------+
| L1 |
+==========+
| 49422662 |
+----------+
1 tuple (3m 21s) -- PostgreSQL 436秒
sql>select count(distinct c60) from b;
+----------+
| L1 |
+==========+
| 49423470 |
+----------+
1 tuple (3m 24s) -- PostgreSQL 476秒
sql>select count(*) from (select c1,c11,c21,c31,c41,c51,c60 from b group by c1,c11,c21,c31,c41,c51,c60) as t; -- 因为用到了PK C1字段, 速度更快
+----------+
| L1 |
+==========+
| 50000000 |
+----------+
1 tuple (54.3s) -- PostgreSQL 364秒
sql>select count(*) from (select c60 from b group by c1,c11,c21,c31,c41,c51,c60) as t;
+----------+
| L1 |
+==========+
| 50000000 |
+----------+
1 tuple (54.1s) -- PostgreSQL 367秒
sql>select count(c1) from b where c11=c2 or c2=c3 or c4=c5 or c6=c7 or c8=c9;
+------+
| L1 |
+======+
| 1 |
+------+
1 tuple (1.9s) -- PostgreSQL 186秒
未完继续, 数据留几天, 有测试需求大家可以回复本文.
______________________________________________________________________________________
PostgreSQL
PostgreSQL支持并行导入, 但是因为扩展数据块是排他锁, 所以为了提高bulk导入速度, 可以重新编译数据库, 使用32K的block, 原因详见 :
导入耗时 : 3678秒
存储空间 : 96GB
[查询]
数据全部加载到内存后的效率.
[小结]
postgres=# select count(*) from b;
count
----------
50000000
(1 row)
Time: 158240.448 ms
postgres=# select count(distinct c1) from b;
count
----------
50000000
(1 row)
Time: 212004.243 ms
postgres=# select count(distinct c2) from b;
count
----------
49422662
(1 row)
Time: 436279.588 ms
postgres=# select count(distinct c60) from b;
count
----------
49423470
(1 row)
Time: 475820.666 ms
postgres=# select count(*) from (select c1,c11,c21,c31,c41,c51,c60 from b group by c1,c11,c21,c31,c41,c51,c60) as t;
count
----------
50000000
(1 row)
Time: 364431.313 ms
postgres=# select count(*) from (select c60 from b group by c1,c11,c21,c31,c41,c51,c60) as t;
count
----------
50000000
(1 row)
Time: 366643.180 ms
postgres=# select count(c1) from b where c11=c2 or c2=c3 or c4=c5 or c6=c7 or c8=c9;
count
-------
1
(1 row)
Time: 185984.760 ms
[小结]
1. MonetDB适合流式数据实时统计, 由于表级写锁的问题, 程序需要做出一些变更, 例如原来一个表处理的, 使用monetdb可能需要建立多个表来实现并行写.
这方面如果使用PostgreSQL来实现的话, 可以参考
2. MonetDB并不是为高并发设计的, 从man mserver5可以看出端倪.
[参考]
max_clients
Controls how many client slots are allocated for clients to connect. This settings limits the maximum number of connected clients at the same time. Note that MonetDB is not designed to handle massive amounts of connected clients. The funnel capability from monetdbd(1) might be a more suitable solution for such workloads.
Default 64.
[参考]