MonetDB vs PostgreSQL a lite case

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介:
本文对比一下MonetDB 和 PostgreSQL在某些SQL下的性能. 
(MonetDB 使用列存储, 支持单SQL使用多核, 线程模式, 支持数据压缩, 还有较多的OLAP相关的插件, 更适合OLAP的应用场景)

注意, PostgreSQL注重的是高并发, 而MonetDB注重数据分析, 所以两者没有什么可比性, 这里拿MonetDB的强项来VS PostgreSQL的软肋, 主要是给一些在使用PostgreSQL做数据分析并且遇到瓶颈的朋友一些启发, 换个产品试试, 不要使用一个产品的软肋(这里指PostgreSQL在统计方面目前略差, 未来9.4+版本出来单SQL可以用到多核再来谈统计效率)
PostgreSQL支持的列存储插件cstore_fdw性能, 如果要玩的话可以 参考 : 
对于定长类型, 可以提升较高的性能.

测试环境
DELL R610
CPU 1.6G 8核
CentOS 6.5 x64
PostgreSQL 9.3.5 (主要参数 shared_buffer=1G, block_size=8KB, checkpoint_segment=32, wal_segsize=16MB, synchronous_commit=off, autovacuum=on)
MonetDB 11.17.21
块设备 OCZ RevoDrive3X2 240G
文件系统 ext4
内存 96GB
测试数据2.6亿
MonetDB 占用 1.9G. (含PK)
PostgreSQL 占用 22G.  (含PK)

插入性能对比, PostgreSQL比MonetDB几乎差了9倍性能.
 插入条数 
/ 时间(毫秒)
 MonetDB  PostgreSQL
 1024  6  11
 2048  7  21
 4096  10  42
 8192  17  84
 16384  29  166
 32768  51  332
 65536  96  661
 131072  184  1361
 262144  361  2760
 524288  701  5550
 1048576  1300  11221
 2097152  2800  22581
 4194304  5500  45018
 8388608  11200  91126
 16777216  22100  183895
 33554432  45300  376365
 67108864  97000  764031
 134217728  178000  1533433
 平均每秒插入条数  754000/s  87000/s
   
查询性能对比, MonetDB完胜, 完全不是一个数量级的差别. 
当然PostgreSQL count(*)性能本来就不咋地, 更适合OLTP, 曾经用了大量的篇幅来写使用PostgreSQL实现准实时数据统计, 可参阅
 SQL
/
时间(毫秒)
 MonetDB  PostgreSQL
 select count(*) from a;  1  46484
 select count(*) from a where id=1;  1  1
 select count(*) from a where id>1;  2  68813
 select count(*) from a where id<1;  2  0.6
 select count(*) from a where id<>1;  3500  68710
 select count(*) from a where id is null;  752  0.5
 select count(*) from a where id is not null;  710  64771
 select count(*) from a where info='2014-08-13 20:20:14.000000+08:00';  0.4  68006
 select count(*) from a where info>'2014-08-13 20:20:14.000000+08:00';  0.8  91155
 select count(*) from a where info<'2014-08-13 20:20:14.000000+08:00';  1.7  72517
 select count(*) from a where info<>'2014-08-13 20:20:14.000000+08:00';  2800  86810
 select count(*) from a where info is null;  773  43593
 select count(*) from a where info is not null;  748  65161
 select count(*) from (select info from a group by info having count(*) >1) as t;  4700  144180

空间占用对比,
2.6亿数据(含索引), MonetDB 1.9G, PostgreSQL 22G.
当然这里和测试数据有关, 因为字符串大量重复了.
后期做一下宽 表的测试. 对比GreenPlum.
MonetDB还有很多强大的功能等待挖掘, 大家一起来玩吧.

详细数据如下 : 
MonetDB
插入性能
sql>create table a(id int auto_increment primary key, info varchar(32));
operation successful (10.112ms)
sql>insert into a (info) values (now());
1 affected rows, last generated key: 1 (3.274ms)
sql>select * from a;
+------+----------------------------------+
| id   | info                             |
+======+==================================+
|    1 | 2014-08-13 20:18:53.000000+08:00 |
+------+----------------------------------+
1 tuple (2.448ms)
sql>insert into a (info) values (now());
1 affected rows, last generated key: 2 (3.023ms)
sql>insert into a (info) select now() from a;
2 affected row (6.033ms)
sql>insert into a (info) select now() from a;
4 affected row (3.798ms)
sql>insert into a (info) select now() from a;
8 affected row (2.893ms)
sql>insert into a (info) select now() from a;
16 affected row (2.700ms)
sql>insert into a (info) select now() from a;
32 affected row (5.064ms)
sql>insert into a (info) select now() from a;
64 affected row (4.054ms)
sql>insert into a (info) select now() from a;
128 affected row (4.138ms)
sql>insert into a (info) select now() from a;
256 affected row (4.304ms)
sql>insert into a (info) select now() from a;
512 affected row (2.563ms)
sql>insert into a (info) select now() from a;
1024 affected row (6.094ms)
sql>insert into a (info) select now() from a;
2048 affected row (6.956ms)
sql>insert into a (info) select now() from a;
4096 affected row (9.618ms)
sql>insert into a (info) select now() from a;
8192 affected row (17.475ms)
sql>insert into a (info) select now() from a;
16384 affected row (29.347ms)
sql>insert into a (info) select now() from a;
32768 affected row (50.941ms)
sql>insert into a (info) select now() from a;
65536 affected row (95.736ms)
sql>insert into a (info) select now() from a;
131072 affected row (183.726ms)
sql>insert into a (info) select now() from a;
262144 affected row (361.436ms)
sql>insert into a (info) select now() from a;
524288 affected row (701.202ms)
sql>insert into a (info) select now() from a;
1048576 affected row (1.3s)
sql>insert into a (info) select now() from a;
2097152 affected row (2.8s)
sql>insert into a (info) select now() from a;
4194304 affected row (5.5s)
sql>insert into a (info) select now() from a;
8388608 affected row (11.2s)
sql>insert into a (info) select now() from a;
16777216 affected row (22.1s)
sql>insert into a (info) select now() from a;
33554432 affected row (45.3s)
sql>insert into a (info) select now() from a;
67108864 affected row (1m 27s)
sql>insert into a (info) select now() from a;
134217728 affected row (2m 58s)


空间占用大约1.9G.
[root@150 ~]# cd /data02/mdb1/
[root@150 mdb1]# ll
total 16
-rw------- 1 root root 7196 Aug 13 21:08 merovingian.log
-rw-r--r-- 1 root root    6 Aug 13 20:08 merovingian.pid
drwx------ 5 root root 4096 Aug 13 20:09 test
[root@150 mdb1]# du -sh *
8.0K    merovingian.log
4.0K    merovingian.pid
1.9G    test
[root@150 mdb1]# cd test/
[root@150 test]# ll
total 12
-rw-------  1 root root    0 Aug 13 20:09 58ec944a-5cff-dc51-4873-986923c64567
drwx------ 14 root root 4096 Aug 13 20:33 bat
drwx------  2 root root 4096 Aug 13 20:09 box
drwx------  3 root root 4096 Aug 13 20:09 sql_logs
[root@150 test]# du -sh *
0       58ec944a-5cff-dc51-4873-986923c64567
1.9G    bat
4.0K    box
12K     sql_logs
[root@150 test]# cd bat
[root@150 bat]# du -sh *
492K    01
288K    02
92K     04
1.1G    05
513M    07
572K    10
4.0K    11
4.0K    12
65M     13
323M    14
772K    15
4.0K    3.head
4.0K    3.tail
12K     3.theap
4.0K    4.head
4.0K    4.tail
12K     4.theap
28K     BACKUP

[root@150 bat]# ll
total 88
drwx------ 2 root root 4096 Aug 13 21:27 01
drwx------ 2 root root 4096 Aug 13 20:21 02
drwx------ 2 root root 4096 Aug 13 21:15 04
drwx------ 2 root root 4096 Aug 13 21:12 05
drwx------ 2 root root 4096 Aug 13 21:27 07
drwx------ 2 root root 4096 Aug 13 21:27 10
drwx------ 2 root root 4096 Aug 13 21:15 11
drwx------ 2 root root 4096 Aug 13 21:27 12
drwx------ 2 root root 4096 Aug 13 21:27 13
drwx------ 2 root root 4096 Aug 13 21:27 14
drwx------ 2 root root 4096 Aug 13 21:27 15
-rw------- 1 root root    8 Aug 13 20:09 3.head
-rw------- 1 root root    1 Aug 13 20:09 3.tail
-rw------- 1 root root 8216 Aug 13 20:09 3.theap
-rw------- 1 root root    8 Aug 13 20:09 4.head
-rw------- 1 root root    1 Aug 13 20:09 4.tail
-rw------- 1 root root 8343 Aug 13 20:09 4.theap
drwx------ 2 root root 4096 Aug 13 21:27 BACKUP

通过函数storage可以查看详细大小. 这里包含了count和columnsize. info字段占用500mb左右, id字段占用1G左右.
sql>select * from storage();
+-------+-------------------+----------------+-----------+----------+-----------+-------+------------+----------+---------+--------+
| schem | table             | column         | type      | location | count     | typew | columnsize | heapsize | indices | sorted |
: a     :                   :                :           :          :           : idth  :            :          :         :        :
+=======+===================+================+===========+==========+===========+=======+============+==========+=========+========+
| sys   | a                 | id             | int       | 05/573   | 268435456 |     4 | 1073741824 |        0 |       0 | true   |
| sys   | a                 | info           | varchar   | 07/734   | 268435456 |    32 |  536870912 |    16384 |       0 | true   |
| sys   | a                 | a_id_pkey      | oid       | 05/567   |         0 |     8 |          0 |        0 |       0 | true   |


查询测试
sql>select count(*) from a;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (0.979ms)
sql>select count(*) from a where id=1;
+------+
| L1   |
+======+
|    1 |
+------+
1 tuple (0.920ms)
sql>select count(*) from a where id>1;
+-----------+
| L1        |
+===========+
| 268435455 |
+-----------+
1 tuple (2.347ms)
sql>select count(*) from a where id<1;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (2.270ms)
sql>select count(*) from a where id<>1;
+-----------+
| L1        |
+===========+
| 268435455 |
+-----------+
1 tuple (3.5s)
sql>select count(*) from a where id is null;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (751.665ms)
sql>select count(*) from a where id is not null;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (710.477ms)
sql>select * from a limit 10;
+------+----------------------------------+
| id   | info                             |
+======+==================================+
|    1 | 2014-08-13 20:18:53.000000+08:00 |
|    2 | 2014-08-13 20:18:59.000000+08:00 |
|    3 | 2014-08-13 20:20:06.000000+08:00 |
|    4 | 2014-08-13 20:20:06.000000+08:00 |
|    5 | 2014-08-13 20:20:14.000000+08:00 |
|    6 | 2014-08-13 20:20:14.000000+08:00 |
|    7 | 2014-08-13 20:20:14.000000+08:00 |
|    8 | 2014-08-13 20:20:14.000000+08:00 |
|    9 | 2014-08-13 20:20:14.000000+08:00 |
|   10 | 2014-08-13 20:20:14.000000+08:00 |
+------+----------------------------------+
10 tuples (1.581ms)
sql>select count(*) from a where info='2014-08-13 20:20:14.000000+08:00';
+------+
| L1   |
+======+
|   12 |
+------+
1 tuple (0.442ms)
sql>select count(*) from a where info>'2014-08-13 20:20:14.000000+08:00';
+-----------+
| L1        |
+===========+
| 268435440 |
+-----------+
1 tuple (0.844ms)
sql>select count(*) from a where info<'2014-08-13 20:20:14.000000+08:00';
+------+
| L1   |
+======+
|    4 |
+------+
1 tuple (1.734ms)
sql>select count(*) from a where info<>'2014-08-13 20:20:14.000000+08:00';
+-----------+
| L1        |
+===========+
| 268435444 |
+-----------+
1 tuple (2.8s)
sql>select count(*) from a where info is null;
+------+
| L1   |
+======+
|    0 |
+------+
1 tuple (772.987ms)
sql>select count(*) from a where info is not null;
+-----------+
| L1        |
+===========+
| 268435456 |
+-----------+
1 tuple (747.712ms)
sql>select count(*) from (select info from a group by info having count(*) >1) as t;
+------+
| L2   |
+======+
|   18 |
+------+
1 tuple (4.7s)


PostgreSQL
插入测试
postgres=# create table a(id serial primary key, info text);
CREATE TABLE
postgres=# insert into a(info) values (now());
INSERT 0 1
postgres=# insert into a(info) select now() from a;
INSERT 0 1
Time: 0.954 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2
Time: 0.606 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4
Time: 0.352 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8
Time: 0.393 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16
Time: 0.476 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 32
Time: 0.840 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 64
Time: 1.166 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 128
Time: 1.632 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 256
Time: 3.214 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 512
Time: 5.519 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 1024
Time: 10.924 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2048
Time: 21.212 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4096
Time: 41.930 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8192
Time: 84.242 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16384
Time: 165.705 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 32768
Time: 332.269 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 65536
Time: 661.579 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 131072
Time: 1361.281 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 262144
Time: 2760.690 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 524288
Time: 5550.096 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 1048576
Time: 11221.544 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 2097152
Time: 22581.858 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 4194304
Time: 45018.124 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 8388608
Time: 91126.619 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 16777216
Time: 183895.102 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 33554432
Time: 376365.578 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 67108864
Time: 764031.565 ms
postgres=# insert into a(info) select now() from a;
INSERT 0 134217728
Time: 1533433.507 ms

postgres=# select pg_total_relation_size('a')/1024/1024/1024||'GB';
 ?column? 
----------
 22GB
(1 row)
Time: 1.027 ms
postgres=# select pg_relation_size('a')/1024/1024/1024||'GB';
 ?column? 
----------
 17GB
(1 row)
Time: 0.702 ms


查询测试
postgres=# select count(*) from a;
   count   
-----------
 268435456
(1 row)

Time: 60818.321 ms
postgres=# select count(*) from a;
   count   
-----------
 268435456
(1 row)

Time: 46484.165 ms
postgres=# select count(*) from a where id=1;
 count 
-------
     1
(1 row)

Time: 0.996 ms
postgres=# select count(*) from a where id>1;
   count   
-----------
 268435455
(1 row)

Time: 68812.883 ms
postgres=# select count(*) from a where id<1;
 count 
-------
     0
(1 row)

Time: 0.647 ms
postgres=# select count(*) from a where id<>1;
   count   
-----------
 268435455
(1 row)

Time: 68710.001 ms
postgres=# select count(*) from a where id is null;
 count 
-------
     0
(1 row)

Time: 0.537 ms
postgres=# select count(*) from a where id is not null;
   count   
-----------
 268435456
(1 row)

Time: 64771.858 ms
postgres=# select * from a limit 10;
 id |             info              
----+-------------------------------
  1 | 2014-08-13 20:37:52.165199+08
  2 | 2014-08-13 20:38:06.945296+08
  3 | 2014-08-13 20:38:07.965092+08
  4 | 2014-08-13 20:38:07.965092+08
  5 | 2014-08-13 20:38:08.502181+08
  6 | 2014-08-13 20:38:08.502181+08
  7 | 2014-08-13 20:38:08.502181+08
  8 | 2014-08-13 20:38:08.502181+08
  9 | 2014-08-13 20:38:08.966175+08
 10 | 2014-08-13 20:38:08.966175+08
(10 rows)

Time: 0.629 ms
postgres=# select count(*) from a where info='2014-08-13 20:38:08.966175+08';
 count 
-------
     8
(1 row)

Time: 68005.864 ms
postgres=# select count(*) from a where info>'2014-08-13 20:38:08.966175+08';
   count   
-----------
 268435440
(1 row)

Time: 91154.757 ms
postgres=# select count(*) from a where info<'2014-08-13 20:38:08.966175+08';
 count 
-------
     8
(1 row)

Time: 72516.503 ms
postgres=# select count(*) from a where info<>'2014-08-13 20:38:08.966175+08';
   count   
-----------
 268435448
(1 row)

Time: 86810.115 ms
postgres=# select count(*) from a where info is null;
 count 
-------
     0
(1 row)

Time: 43593.035 ms
postgres=# select count(*) from a where info is not null;
   count   
-----------
 268435456
(1 row)

Time: 65161.258 ms
postgres=# select count(*) from (select info from a group by info having count(*) >1) as t;
 count 
-------
    27
(1 row)
Time: 144180.678 ms


[参考]
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
1月前
|
关系型数据库 MySQL 数据库
探究数据库开源协议:PostgreSQL vs MySQL
探究数据库开源协议:PostgreSQL vs MySQL
|
4月前
|
存储 关系型数据库 MySQL
【专栏】在众多开源数据库中,MySQL和PostgreSQL无疑是最受欢迎的两个
【4月更文挑战第27天】MySQL与PostgreSQL是两大主流开源数据库,各有特色。MySQL注重简单、便捷和高效,适合读操作密集场景,而PostgreSQL强调灵活、强大和兼容,擅长并发写入与复杂查询。MySQL支持多种存储引擎和查询缓存,PostgreSQL则具备扩展性、强事务支持和高可用特性。选择时应考虑项目需求、团队技能和预期收益。
76 2
|
4月前
|
关系型数据库 MySQL 数据处理
MySQL vs. PostgreSQL:选择适合你的开源数据库
在当今信息时代,开源数据库成为许多企业和开发者的首选。本文将比较两个主流的开源数据库——MySQL和PostgreSQL,分析它们的特点、优势和适用场景,以帮助读者做出明智的选择。
|
11月前
|
存储 关系型数据库 MySQL
如何选择最适合你的数据库解决方案:PostgreSQL VS MySQL 技术选型对比
如何选择最适合你的数据库解决方案:PostgreSQL VS MySQL 技术选型对比
288 1
|
存储 算法 Oracle
PostgreSQL的MVCC vs InnoDB的MVCC
PostgreSQL的MVCC vs InnoDB的MVCC
83 0
PostgreSQL的MVCC vs InnoDB的MVCC
|
存储 SQL 关系型数据库
【数据库选型】ClickHouse vs PostgreSQL vs TimescaleDB
在过去的一年里,我们不断听到的一个数据库是ClickHouse,这是一个由Yandex最初构建并开源的面向列的OLAP数据库。
|
Web App开发 SQL Oracle
PostgreSQL vs PPAS 差异 - Oracle评估、迁移、验证、性能调优
标签 PostgreSQL , PPAS , 阿里云 背景 1、ppas手册(高度兼容Oracle): https://www.enterprisedb.com/docs/en/11.0/EPAS_Oracompat_Ref_Guide_v11/toc.
1882 0
|
关系型数据库 分布式数据库 PolarDB
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
《阿里云产品手册2022-2023 版》——PolarDB for PostgreSQL
341 0
|
存储 缓存 关系型数据库
|
存储 SQL 并行计算
PolarDB for PostgreSQL 开源必读手册-开源PolarDB for PostgreSQL架构介绍(中)
PolarDB for PostgreSQL 开源必读手册-开源PolarDB for PostgreSQL架构介绍
392 0