背景信息
TPC-H(商业智能计算测试)是美国交易处理效能委员会(TPC,Transaction Processing Performance Council)组织制定的用来模拟决策支持类应用的一个测试集。目前在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。TPC-H 是根据真实的生产运行环境来建模,模拟了一套销售系统的数据仓库。其共包含 8 张表,数据量可设定从 1G~3T 不等。其基准测试共包含了22个查询,主要评价指标各个查询的响应时间,即从提交查询到结果返回所需时间。其测试结果可综合反映系统处理查询时的能力。详情参考TPCH 文档。
数据集介绍
该数据集包含如下 8 张表,互相间的关系如下图所示。
测试详情
测试数据量说明
测试数据量会直接影响测试结果,TPC-H 的生成工具中使用 SF ( scale factor ) 控制生成数据的数据量的大小,1 SF 对应 1 GB。
注意:以上提及的数据量仅仅为 原始数据的数据量,不包括索引等空间占用,所以准备环境时,需要预留更多的空间。
测试环境
本次测试使用了独享实例(按量付费)的实例,由于仅为测试示意使用,所以计算资源配置选择了8核32G。
测试场景
本测试场景主要包含3部分:
- OLAP查询场景测试,主要使用列存表,直接使用TPCH测试中的22条查询;
- Key/Value点查场景测试,主要使用行存表,针对orders使用行存表后,进行主键过滤的点查;
基础环境准备
- 该步骤主要用于准备OLAP查询场景和Key/Value点查场景所需的数据;
基础环境准备
1. 创建 ECS 实例
登陆阿里云,创建一个 ECS 实例,用于数据生成、向 Hologres 导入数据、客户端测试。建议规格:
- ecs.g6.4xlarge 规格
- CentOS 7.9 系统
- ESSD 数据盘,具体数据容量根据需要测试的数据量大小决定
- 建议 ECS 与 Hologres 实例用相同 Region 和 VPC 网络
2. 创建 Hologres 实例
- 登陆阿里云,进入 Hologres 产品控制台,点击新增引擎实例
- 选择配置,并填写实例名称,详细说明请参考官方文档。
3. 创建测试数据库
- 在创建实例后,您需要登陆您创建的 Hologres 实例,创建一个数据库,本测试中命名数据库为
tpch_1sf
,详细操作步骤请参考官方文档
生成 TPC-H 数据
1. 准备数据生成工具
- 远程链接 ECS 实例
- 更新所有库
yum update
- 安装 git
yum install git
- 安装gcc
yum install gcc
- 下载 TPC-H 数据生成代码
git clone https://github.com/gregrahn/tpch-kit.git
- 进入数据生成工具代码目录
cd tpch-kit/dbgen
- 编译数据生成工具代码
make
2. 生成数据
- 编译成功后,您可以使用如下代码查看代码生成工具的相关参数。
./dbgen --help
- 本次测试仅生成 1 GB 数据,所以运行如下代码生成数据。
./dbgen -vf -s 1
如您需要生成更多数据量的数据,可以调整 SF 的参数,例如您可以使用如下代码生成 1 T 数据
./dbgen -vf -s 1000
- 一般情况下,32CU 可以跑 TPCH SF10,256CU 可以跑 TPCH SF50
- 数据生成后,您可以使用如下代码查看生成的文件。可以看到生成工具生成了 8 个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'
OLAP查询场景测试
准备数据
1. 创建表
- 由于本文主要使用 psql 进行数据导入操作,需要先在 ECS 中运行如下命令安装 psql
yum install postgresql-server
- 安装 psql 后,您可以使用如下命令登陆 Hologres 实例
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
- 使用psql连接Hologres后,您可以使用如下建表语句创建数据库表
DROP TABLE IF EXISTS LINEITEM;
BEGIN;
CREATE TABLE LINEITEM
(
L_ORDERKEY INT NOT NULL,
L_PARTKEY INT NOT NULL,
L_SUPPKEY INT NOT NULL,
L_LINENUMBER INT NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG TEXT NOT NULL,
L_LINESTATUS TEXT NOT NULL,
L_SHIPDATE TIMESTAMPTZ NOT NULL,
L_COMMITDATE TIMESTAMPTZ NOT NULL,
L_RECEIPTDATE TIMESTAMPTZ NOT NULL,
L_SHIPINSTRUCT TEXT NOT NULL,
L_SHIPMODE TEXT NOT NULL,
L_COMMENT TEXT NOT NULL,
PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS ORDERS;
BEGIN;
CREATE TABLE ORDERS
(
O_ORDERKEY INT NOT NULL PRIMARY KEY,
O_CUSTKEY INT NOT NULL,
O_ORDERSTATUS TEXT NOT NULL,
O_TOTALPRICE DECIMAL(15,2) NOT NULL,
O_ORDERDATE timestamptz NOT NULL,
O_ORDERPRIORITY TEXT NOT NULL,
O_CLERK TEXT NOT NULL,
O_SHIPPRIORITY INT NOT NULL,
O_COMMENT TEXT NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PARTSUPP;
BEGIN;
CREATE TABLE PARTSUPP
(
PS_PARTKEY INT NOT NULL,
PS_SUPPKEY INT NOT NULL,
PS_AVAILQTY INT NOT NULL,
PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
PS_COMMENT TEXT NOT NULL,
PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS PART;
BEGIN;
CREATE TABLE PART
(
P_PARTKEY INT NOT NULL PRIMARY KEY,
P_NAME TEXT NOT NULL,
P_MFGR TEXT NOT NULL,
P_BRAND TEXT NOT NULL,
P_TYPE TEXT NOT NULL,
P_SIZE INT NOT NULL,
P_CONTAINER TEXT NOT NULL,
P_RETAILPRICE DECIMAL(15,2) NOT NULL,
P_COMMENT TEXT NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'colocate_with', 'LINEITEM');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
C_CUSTKEY INT NOT NULL PRIMARY KEY,
C_NAME TEXT NOT NULL,
C_ADDRESS TEXT NOT NULL,
C_NATIONKEY INT NOT NULL,
C_PHONE TEXT NOT NULL,
C_ACCTBAL DECIMAL(15,2) NOT NULL,
C_MKTSEGMENT TEXT NOT NULL,
C_COMMENT TEXT NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS SUPPLIER;
BEGIN;
CREATE TABLE SUPPLIER
(
S_SUPPKEY INT NOT NULL PRIMARY KEY,
S_NAME TEXT NOT NULL,
S_ADDRESS TEXT NOT NULL,
S_NATIONKEY INT NOT NULL,
S_PHONE TEXT NOT NULL,
S_ACCTBAL DECIMAL(15,2) NOT NULL,
S_COMMENT TEXT NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS NATION;
BEGIN;
CREATE TABLE NATION(
N_NATIONKEY INT NOT NULL PRIMARY KEY,
N_NAME text NOT NULL,
N_REGIONKEY INT NOT NULL,
N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;
DROP TABLE IF EXISTS REGION;
BEGIN;
CREATE TABLE REGION
(
R_REGIONKEY INT NOT NULL PRIMARY KEY,
R_NAME TEXT NOT NULL,
R_COMMENT TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
- 创建完毕后,您能在 psql 中使用如下代码查看是否创建成功
tpch_1sf=# \dt
- 若成功,现实效果如下
tpch_1sf=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------+-------+--------------------
public | customer | table | tpch_1sf_developer
public | lineitem | table | tpch_1sf_developer
public | nation | table | tpch_1sf_developer
public | orders | table | tpch_1sf_developer
public | part | table | tpch_1sf_developer
public | partsupp | table | tpch_1sf_developer
public | region | table | tpch_1sf_developer
public | supplier | table | tpch_1sf_developer
(8 rows)
2. 导入数据
- 本测试方案主要使用
COPY FROM STDIN
的方式导入数据详细可以参考官方文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创建的表中。 - 您可以在数据生成工具的目录中参考如下
shell
脚本导入数据
for i in `ls *.tbl`; do
echo $i;
name=`echo $i| cut -d'.' -f1`;
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;
done
- 至此您已完成数据导入
3. 收集统计信息
- 为了更好的执行查询,可以在 psql 中使用如下语句,使 Hologres 收集各张表特征信息。
vacuum region;
vacuum nation;
vacuum supplier;
vacuum customer;
vacuum part;
vacuum partsupp;
vacuum orders;
vacuum lineitem;
analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;
执行查询
- 为了方便统计查询信息,需要使用pgbench工具,您可以使用如下命令安装pgbench(如果测试机上已有pgbench,请确保版本大于9.6以上,最好大版本是13以上,否则以下测试会遇到各种不兼容)
yum install postgresql-contrib
- 为了方便查询,您可以直接通过以下连接,下载所需的22条SQL
- 然后上传至ECS
- 进入ECS,并进入上传文件的目录,使用如下shell命令解压缩文件
unzip tpch_data_tpch_query
- 至此,您已经完成了准备工作,即可使用pgbench进行测试,您可以使用如下命令执行单条查询
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
- 参数解释
配置项 | 参数 | 说明 |
---|---|---|
-h | Hologres实例的endpoint | 在Hologres管控台查看 |
-p | Hologres实例的端口地址 | 在Hologres管控台查看 |
-d | Hologres指定实例中的数据库名 | |
-c | 客户端数目(并发度) | 示例:1,由于该测试仅测试查询性能,不测试并发,所以并发度置为1即可 |
-t | 每个客户端需要执行的压测query数目 | 50 |
-f | 压测的sql | 示例:6.sql |
- 也可以直接执行如下
shell
脚本,直接批量执行22条查询,并将结果输出到文件hologres_tpch_test.out中
rm -f hologres_tpch_test.out
echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
for i in {1..22}
do
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.out
done
查看hologres_tpch_test.out即可得到查询结果,样例如下
- transaction type:说明了执行的具体的SQL文件
- latency average:记录了对应SQL文件的3次查询的平均时间
2021-03-23 03:50:54 begin
pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
transaction type: ./tpch_data_tpch_query/1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 3
number of transactions actually processed: 3/3
latency average = 76.936 ms
tps = 12.997850 (including connections establishing)
tps = 15.972757 (excluding connections establishing)
...
TPCH 22条查询语句
Q1
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
Q2
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
and ps_supplycost = (
select
min(ps_supplycost)
from
partsupp,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;
Q3
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'BUILDING'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-15'
and l_shipdate > date '1995-03-15'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
limit 10;
Q4
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
Q5
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= date '1994-01-01'
and o_orderdate < date '1994-01-01' + interval '1' year
group by
n_name
order by
revenue desc;
Q6
select
sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
and l_discount between 6 - 1 and 6 + 1
and l_quantity < 2400
Q7
set hg_experimental_enable_double_equivalent=on;
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;
Q8
set hg_experimental_enable_double_equivalent=on;
select
o_year,
sum(case
when nation = 'BRAZIL' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'STANDARD POLISHED TIN'
) as all_nations
group by
o_year
order by
o_year;
Q9
set hg_experimental_enable_double_equivalent=on;
select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
Q10
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date '1993-10-01'
and o_orderdate < date '1993-10-01' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit 20;
Q11
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.0000010000
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'GERMANY'
)
order by
value desc
limit 100;
Q12
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'SHIP')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
Q13
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc;
Q14
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-09-01' + interval '1' month;
Q15
with revenue0(SUPPLIER_NO, TOTAL_REVENUE) as
(
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1995-12-01'
and l_shipdate < date '1995-12-01' + interval '3' month
group by
l_suppkey
)
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey;
Q16
select
p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#45'
and p_type not like 'MEDIUM POLISHED%'
and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
Q17
select
sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem,
part
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem
where
l_partkey = p_partkey
and l_partkey in(
select p_partkey
from part
where p_brand = 'Brand#23' and p_container = 'MED BOX')
);
Q18
select
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
from
customer,
orders,
lineitem
where
o_orderkey in (
select
l_orderkey
from
lineitem
group by
l_orderkey having
sum(l_quantity) > 300
)
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice
order by
o_totalprice desc,
o_orderdate
limit 100;
Q19
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 1 and l_quantity <= 1 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#34'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
Q20
select
s_name,
s_address
from
supplier,
nation
where
s_suppkey in (
select
ps_suppkey
from
partsupp
where
ps_partkey in (
select
p_partkey
from
part
where
p_name like 'forest%'
)
and ps_availqty > (
select
0.5 * sum(l_quantity)
from
lineitem
where
l_partkey = ps_partkey
and l_suppkey = ps_suppkey
and l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
)
)
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by
s_name;
Q21
select
s_name,
count(*) as numwait
from
supplier,
lineitem l1,
orders,
nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
group by
s_name
order by
numwait desc,
s_name
limit 100;
Q22
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone from 1 for 2) as cntrycode,
c_acctbal
from
customer
where
substring(c_phone from 1 for 2) in
('13', '31', '23', '29', '30', '18', '17')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substring(c_phone from 1 for 2) in
('13', '31', '23', '29', '30', '18', '17')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) as custsale
group by
cntrycode
order by
cntrycode;
Key/Value点查场景测试
准备数据
1. 创建表
- 继续使用OLAP查询场景创建的数据库,我们会使用TPCH数据集中的orders表进行测试,使用psql连接Hologres后,您可以使用如下建表语句创建数据库表;
注意:点查场景需要使用行存表,所以需要创建一张新表,不能使用OLAP查询场景中使用的表
DROP TABLE IF EXISTS orders_row;
BEGIN;
CREATE TABLE public.orders_row (
"o_orderkey" int8 NOT NULL,
"o_custkey" int8,
"o_orderstatus" bpchar(1),
"o_totalprice" numeric(15,2),
"o_orderdate" date,
"o_orderpriority" bpchar(15),
"o_clerk" bpchar(15),
"o_shippriority" int8,
"o_comment" varchar(79),
PRIMARY KEY (o_orderkey)
);
CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');
CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
COMMIT;
2. COPY方式导入数据
- 本测试方案主要使用
COPY FROM STDIN
的方式导入数据详细可以参考官方文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创建的表中。 - 您可以在数据生成工具的目录中参考如下命令导入数据
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl
3. INSERT INTO方式导入数据
- 由于OLAP场景时您已经导入了orders表的数据,您可以运行如下SQL语句导入数据
INSERT INTO public.orders_row
SELECT *
FROM public.orders;
查询
1. 生成查询语句
- Key/Value点查场景主要的查询语句特征如下
SELECT column_a
,column_b
,...
,column_x
FROM table_x
WHERE pk = value_x
;
或
SELECT column_a
,column_b
,...
,column_x
FROM table_x
WHERE pk IN ( value_a, value_b,..., value_x )
;
您可以使用如下脚本生成所需的sql,该脚本会生成2条sql
- kv_query_single.sql 针对单值筛选的SQL
- kv_query_in.sql 针对多值筛选的SQL,该脚本会随机生成一个针对10个值筛选的SQL
rm -rf kv_query
mkdir kv_query
cd kv_query
echo '\set column_values random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sql
echo '\set column_values1 random(1,99999999)
\set column_values2 random(1,99999999)
\set column_values3 random(1,99999999)
\set column_values4 random(1,99999999)
\set column_values5 random(1,99999999)
\set column_values6 random(1,99999999)
\set column_values7 random(1,99999999)
\set column_values8 random(1,99999999)
\set column_values9 random(1,99999999)
\set column_values10 random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql
2. 进行查询
- 查询需要使用pgbench,您可以使用如下命令安装pgbench
yum install postgresql-contrib
- 之后您即可使用pgbench进行压测,针对单值筛选的场景
注意,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
- 针对多值筛选的场景
注意,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
- 参数解释
配置项 | 参数 | 说明 |
---|---|---|
-h | Hologres实例的endpoint | 在Hologres管控台查看 |
-p | Hologres实例的端口地址 | 在Hologres管控台查看 |
-d | Hologres指定实例中的数据库名 | |
-c | 客户端数目(并发度) | 示例:8 |
-t | 每个客户端需要执行的压测query数目 | 50 |
-f | 压测的sql | 示例:6.sql |
测试结果参考
测试数据量:
- 本测试基于TPCH 100G的数据集进行测试,具体数据量如下表所示
表名 | 行数 |
---|---|
LINEITEM | 600,037,902 |
ORDERS | 150,000,000 |
PARTSUPP | 80,000,000 |
PART | 20,000,000 |
CUSTOMER | 15,000,000 |
SUPPLIER | 1,000,000 |
NATION | 25 |
REGION | 5 |
集群规格
计算资源 | 存储容量 | 软件版本 | 备注 | |
---|---|---|---|---|
64 CU | ||||
(CPU:64 Core 内存:256 GB) | 100 GB | r0.10.20 | 使用集群默认配置,Shard数量:40 | |
128 CU | ||||
(CPU:128 Core 内存:512 GB) | 100 GB | r0.10.20 | 使用集群默认配置,Shard数量:80 |
测试时间:2021年6月
测试结果
数据导入时间
- 数据导入执行时间以秒(s)为单位。
- 导入时间指将数据导入Hologres内表
- 在使用COPY方法导入数据时,一张表对应一个数据文件,并未使用并发导入方式。
- 具体数值如下表所示
说明:使用COPY方式导入时一张表对应一个数据文件,并未使用并发导入方式
表名 | 行数 | 数据量 | Hologres 64CU | ||
---|---|---|---|---|---|
使用COPY方式导入(公网网络) | 使用COPY方式导入(VPC网络导入) | 使用MaxCompute外表导入 | |||
LINEITEM | 600,037,902 | 73.6GB | 3,070.453 | 694.364 | 148.165 |
ORDERS | 150,000,000 | 16.4GB | 691.060 | 172.529 | 37.741 |
PARTSUPP | 80,000,000 | 2.3GB | 468.560 | 107.092 | 18.488 |
PART | 20,000,000 | 11.3GB | 96.342 | 24.020 | 8.083 |
CUSTOMER | 15,000,000 | 2.3GB | 95.190 | 22.937 | 10.363 |
SUPPLIER | 1,000,000 | 132MB | 5.057 | 1.803 | 1.503 |
NATION | 25 | 2KB | 0.580 | 0.584 | 0.747 |
REGION | 5 | 0.375KB | 0.168 | 0.153 | 0.430 |
Total | 106G | 4427.410 | 1023.482 | 225.52 |
- 下图中蓝色为使用COPY方式在公网条件下导入数据的时间,绿色为使用COPY方式在VPC网络条件下导入数据的时间,灰色为使用MaxCompute外表方式导入的时间
- 纵坐标数值越低,表示导入速度越快
- 横轴:表名。纵轴:数据导入时间(s)
- 可以看出,由于网络带宽影响,使用COPY方式导入本地文件数据时,使用VPC网络导入数据时间明显短于使用公网导入数据时间;使用MaxCompute导入数据时间明显短于使用COPY方式导入本地文件数据时间。
查询时间
- 查询执行时间以秒(s)为单位。
- 查询结果均基于Hologres内表
- 具体数值如下表所示
TPCH Query编号 | Hologres 64CU | Hologres 128CU |
---|---|---|
1 | 3.120 | 2.150 |
2 | 0.581 | 0.467 |
3 | 1.735 | 1.005 |
4 | 1.558 | 0.836 |
5 | 2.921 | 1.917 |
6 | 0.297 | 0.096 |
7 | 2.006 | 1.029 |
8 | 2.674 | 1.679 |
9 | 5.298 | 2.796 |
10 | 1.944 | 0.924 |
11 | 0.397 | 0.297 |
12 | 1.531 | 0.852 |
13 | 1.741 | 0.971 |
14 | 0.286 | 0.160 |
15 | 0.293 | 0.177 |
16 | 1.223 | 1.020 |
17 | 1.405 | 0.607 |
18 | 3.817 | 2.169 |
19 | 1.400 | 0.622 |
20 | 1.358 | 0.868 |
21 | 4.164 | 2.047 |
22 | 1.121 | 0.654 |
Total | 40.870 | 23.343 |
- 下图中蓝色为64CU的实例的查询结果,绿色为128CU实例的查询结果
- 纵坐标数值越低,表示 TPC-H 性能越好。
- 可以看出随着实例规模的成本增长,查询时间也在成线性下降趋势
- 横轴:query在文档中的编号。纵轴:query执行时间(s)