zedstore开发版初体验-阿里云开发者社区

1. 概述

zedstore是开发中的一个PostgreSQL的行列混合存储引擎，
其设计目标偏OLAP场景，但是又能支持所有OLTP的操作，包括MVCC，索引等。
在设计上当OLAP和OLTP的目标发生冲突时，会优先OLAP，所以OLTP的性能会差一点。

zedstore的设计目标参考下面的说明

https://www.postgresql.org/message-id/CALfoeiuF-m5jg51mJUPm5GN8u396o5sA2AF5N97vTRAEDYac7w%40mail.gmail.com

Motivations / Objectives

* Performance improvement for queries selecting subset of columns
  (reduced IO).
* Reduced on-disk footprint compared to heap table. Shorter tuple
  headers and also leveraging compression of similar type data
* Be first-class citizen in the Postgres architecture (tables data can
  just independently live in columnar storage)
* Fully MVCC compliant
* All Indexes supported
* Hybrid row-column store, where some columns are stored together, and
  others separately. Provide flexibility of granularity on how to
  divide the columns. Columns accessed together can be stored
  together.
* Provide better control over bloat (similar to zheap)
* Eliminate need for separate toast tables
* Faster add / drop column or changing data type of column by avoiding
  full rewrite of the table.

zedstore内部page中可以存储未压缩的单个tuple，也可以存储压缩过的多个tuple的集合。
每个tuple用TID标识，TID是个逻辑标识，不同于heap中TID代表物理位置。
在zedstore的整个数据文件中，表被切分成很多列族（类似hbase，当前的开发版本固定每列都是一个列族），
每个列族都是一个btree，按TID的顺序组织，整个zedstore数据文件就是一个btree的森林（和gin类似）。

+-----------------------------
| Fixed-size page header:
|
|   LSN
|   TID low and hi key (for Lehman & Yao B-tree operations)
|   left and right page pointers
|
| Items:
|
|   TID | size | flags | uncompressed size | lastTID | payload (containeritem)
|   TID | size | flags | uncompressed size | lastTID | payload (containeritem)
|   TID | size | flags | undo pointer | payload (plain item)
|   TID | size | flags | undo pointer | payload (plain item)
|   ...
|
+----------------------------

zedstore虽然在OLTP场景下的性能不是最优，由于zedstore支持数据压缩，将来可以用来存放OLTP库的冷数据。

下面做个简单的测试体验一下。

2. 测试环境

CentOS 7.3(16核128G SSD)
zedstore

3. 编译安装

3.1 下载zedstore源码

https://github.com/greenplum-db/postgres/tree/zedstore

3.2 安装lz4

yum install lz4,lz4-devel

也可以下载lz4源码安装，但源码安装后要执行一次ldconfig。否则编译时configure可能出错。

3.3 编译

cd postgres-zedstore/
./configure --prefix=/usr/pgzedstore --with-lz4
make -j 16
make install
cd contrib/
make -j 16
make install

编译debug版，可以在configure上添加CFLAGS="-O0 -DOPTIMIZER_DEBUG -g3"参数

3.4 初始化实例

su - postgres
/usr/pgzedstore/bin/initdb /pgsql/datazedstore -E UTF8 --no-locale
/usr/pgzedstore/bin/pg_ctl -D /pgsql/datazedstore -l logfile restart -o'-p 5444'

4 测试

4.1 初始化测试库

/usr/pgzedstore/bin/pgbench -i -s 100 -p5444

4.2 heap表测试

[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 61833
latency average = 0.162 ms
tps = 6183.241453 (including connections establishing)
tps = 6184.485276 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         0.161  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S -M prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 158597
latency average = 0.063 ms
tps = 15859.550657 (including connections establishing)
tps = 15862.665007 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         0.062  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r  -M prepared
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 18910
latency average = 0.529 ms
tps = 1890.901809 (including connections establishing)
tps = 1891.305759 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         0.000  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.021  BEGIN;
         0.113  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.053  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.104  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.091  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.049  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.096  END;

4.2 创建zedstore表

create table pgbench_accounts2(like pgbench_accounts including all) using zedstore;
insert into pgbench_accounts2 select * from pgbench_accounts;
alter table pgbench_accounts rename to pgbench_accounts_old;
alter table pgbench_accounts2 rename to pgbench_accounts;

改造后，可以发现zedstore表的size小了很多。

postgres=# \d+
                                   List of relations
 Schema |         Name         | Type  |  Owner   | Persistence |  Size   | Description 
--------+----------------------+-------+----------+-------------+---------+-------------
 public | pgbench_accounts     | table | postgres | permanent   | 61 MB   | 
 public | pgbench_accounts_old | table | postgres | permanent   | 1283 MB | 
 public | pgbench_branches     | table | postgres | permanent   | 40 kB   | 
 public | pgbench_history      | table | postgres | permanent   | 992 kB  | 
 public | pgbench_tellers      | table | postgres | permanent   | 104 kB  | 
(5 rows)

压缩效果这么好和pgbench_accounts表中重复值非常多有关。

我们再构造一些随机的数据对比zedstore的压缩效果。

create table tb1(id int,c1 text);
insert into tb1 select id,md5(id::text) from generate_series(1,1000000)id;
create table tb2(id int,c1 text) using zedstore;
insert into tb2 select id,md5(id::text) from generate_series(1,1000000)id;

这个的压缩效果就差了很多，lz4压缩速度比较快，但其本身的压缩率比较低。

postgres=# select * from tb2 limit 5;
 id |                c1                
----+----------------------------------
  1 | c4ca4238a0b923820dcc509a6f75849b
  2 | c81e728d9d4c2f636f067f89cc14862c
  3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
  4 | a87ff679a2f3e71d9181a67b7542122c
  5 | e4da3b7fbbce2345d7772b0674a318d5
(5 rows)
postgres=# \d+
                                   List of relations
 Schema |         Name         | Type  |  Owner   | Persistence |  Size   | Description 
--------+----------------------+-------+----------+-------------+---------+-------------
...
 public | tb1                  | table | postgres | permanent   | 65 MB   | 
 public | tb2                  | table | postgres | permanent   | 38 MB   | 
(7 rows)

4.3 zedstore表测试

[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S
transaction type: <builtin: select only>
scaling factor: 100
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3663
latency average = 2.730 ms
tps = 366.280735 (including connections establishing)
tps = 366.360837 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         2.729  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r -S -M prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3907
latency average = 2.560 ms
tps = 390.614280 (including connections establishing)
tps = 390.692250 (excluding connections establishing)
statement latencies in milliseconds:
         0.000  \set aid random(1, 100000 * :scale)
         2.559  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
[postgres@host10372181 ~]$/usr/pgzedstore/bin/pgbench -n -c 1 -j 1 -T 10 -p5444 -r  -M prepared
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 811
latency average = 12.340 ms
tps = 81.036743 (including connections establishing)
tps = 81.053603 (excluding connections establishing)
statement latencies in milliseconds:
         0.001  \set aid random(1, 100000 * :scale)
         0.000  \set bid random(1, 1 * :scale)
         0.000  \set tid random(1, 10 * :scale)
         0.001  \set delta random(-5000, 5000)
         0.023  BEGIN;
        11.888  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.084  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         0.081  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         0.095  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.052  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         0.112  END;

zedstore表和heap表的测试结果汇总如下（单位tps）

测试模式	预编译模式	heap表	zedstore表
selectonly	simple	6184	366
selectonly	prepared	15862	390
normal	prepared	1891	81

4.4 聚合查询的性能对比

对前面创建的有100万记录的tb1和tb2表执行聚合查询，比对执行时间,单位毫秒。

postgres=# \d+ tb1
                                    Table "public.tb1"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 id     | integer |           |          |         | plain    |              | 
 c1     | text    |           |          |         | extended |              | 
Access method: heap

postgres=# \d+ tb2
                                    Table "public.tb2"
 Column |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------+-----------+----------+---------+----------+--------------+-------------
 id     | integer |           |          |         | plain    |              | 
 c1     | text    |           |          |         | extended |              | 
Access method: zedstore

SQL	heap表	zedstore表
count(*)	77	356
count(1)	82	356
count(id)	98	294
max(id)	95	285
avg(id)	99	307

小结

从测试可以看出，目前的zedstore还没发挥出其行列混合存储应有的潜质，可能zedstore当前主要工作重心是确保逻辑正确性上，还没开始做性能上的优化。

zedstore开发版初体验

1. 概述

2. 测试环境

3. 编译安装

3.1 下载zedstore源码

3.2 安装lz4

3.3 编译

3.4 初始化实例

4 测试

4.1 初始化测试库

4.2 heap表测试

4.2 创建zedstore表

4.3 zedstore表测试

4.4 聚合查询的性能对比

小结

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

zedstore开发版初体验

1. 概述

2. 测试环境

3. 编译安装

3.1 下载zedstore源码

3.2 安装lz4

3.3 编译

3.4 初始化实例

4 测试

4.1 初始化测试库

4.2 heap表测试

4.2 创建zedstore表

4.3 zedstore表测试

4.4 聚合查询的性能对比

小结

热门文章

最新文章

相关课程

相关电子书

相关实验场景