Hologres基于TPCH的性能测试介绍

简介: 本文将会介绍在Hologres中如何基于TPCH数据集做性能测试,并提供测试结果参考,方便您进行产品规格选型。

背景信息

TPC-H(商业智能计算测试)是美国交易处理效能委员会(TPC,Transaction Processing Performance Council)组织制定的用来模拟决策支持类应用的一个测试集。目前在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。TPC-H 是根据真实的生产运行环境来建模,模拟了一套销售系统的数据仓库。其共包含 8 张表,数据量可设定从 1G~3T 不等。其基准测试共包含了22个查询,主要评价指标各个查询的响应时间,即从提交查询到结果返回所需时间。其测试结果可综合反映系统处理查询时的能力。详情参考TPCH 文档

数据集介绍

该数据集包含如下 8 张表,互相间的关系如下图所示。
tpch.png

测试详情

测试数据量说明

测试数据量会直接影响测试结果,TPC-H 的生成工具中使用 SF ( scale factor ) 控制生成数据的数据量的大小,1 SF 对应 1 GB。

注意:以上提及的数据量仅仅为 原始数据的数据量,不包括索引等空间占用,所以准备环境时,需要预留更多的空间。

测试环境

本次测试使用了独享实例(按量付费)的实例,由于仅为测试示意使用,所以计算资源配置选择了8核32G。

测试场景

本测试场景主要包含3部分:

  1. OLAP查询场景测试,主要使用列存表,直接使用TPCH测试中的22条查询;
  2. Key/Value点查场景测试,主要使用行存表,针对orders使用行存表后,进行主键过滤的点查;

基础环境准备

  • 该步骤主要用于准备OLAP查询场景和Key/Value点查场景所需的数据;

基础环境准备

1. 创建 ECS 实例

登陆阿里云,创建一个 ECS 实例,用于数据生成、向 Hologres 导入数据、客户端测试。建议规格:

  • ecs.g6.4xlarge 规格
  • CentOS 7.9 系统
  • ESSD 数据盘,具体数据容量根据需要测试的数据量大小决定
  • 建议 ECS 与 Hologres 实例用相同 Region 和 VPC 网络

2. 创建 Hologres 实例

  • 登陆阿里云,进入 Hologres 产品控制台,点击新增引擎实例
  • 选择配置,并填写实例名称,详细说明请参考官方文档

3. 创建测试数据库

  • 在创建实例后,您需要登陆您创建的 Hologres 实例,创建一个数据库,本测试中命名数据库为tpch_1sf,详细操作步骤请参考官方文档

生成 TPC-H 数据

1. 准备数据生成工具

  • 远程链接 ECS 实例
  • 更新所有库
yum update
  • 安装 git
yum install git
  • 安装gcc
yum install gcc
  • 下载 TPC-H 数据生成代码
git clone https://github.com/gregrahn/tpch-kit.git
  • 进入数据生成工具代码目录
cd tpch-kit/dbgen
  • 编译数据生成工具代码
make

2. 生成数据

  • 编译成功后,您可以使用如下代码查看代码生成工具的相关参数。
./dbgen --help
  • 本次测试仅生成 1 GB 数据,所以运行如下代码生成数据。
./dbgen -vf -s 1
如您需要生成更多数据量的数据,可以调整 SF 的参数,例如您可以使用如下代码生成 1 T 数据
./dbgen -vf -s 1000
  • 一般情况下,32CU 可以跑 TPCH SF10,256CU 可以跑 TPCH SF50
  • 数据生成后,您可以使用如下代码查看生成的文件。可以看到生成工具生成了 8 个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'

OLAP查询场景测试

准备数据

1. 创建表

  • 由于本文主要使用 psql 进行数据导入操作,需要先在 ECS 中运行如下命令安装 psql
yum install postgresql-server
  • 安装 psql 后,您可以使用如下命令登陆 Hologres 实例
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
  • 使用psql连接Hologres后,您可以使用如下建表语句创建数据库表
DROP TABLE IF EXISTS LINEITEM;

BEGIN;
CREATE TABLE LINEITEM
(
    L_ORDERKEY      INT         NOT NULL,
    L_PARTKEY       INT         NOT NULL,
    L_SUPPKEY       INT         NOT NULL,
    L_LINENUMBER    INT         NOT NULL,
    L_QUANTITY      DECIMAL(15,2) NOT NULL,
    L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
    L_DISCOUNT      DECIMAL(15,2) NOT NULL,
    L_TAX           DECIMAL(15,2) NOT NULL,
    L_RETURNFLAG    TEXT        NOT NULL,
    L_LINESTATUS    TEXT        NOT NULL,
    L_SHIPDATE      TIMESTAMPTZ NOT NULL,
    L_COMMITDATE    TIMESTAMPTZ NOT NULL,
    L_RECEIPTDATE   TIMESTAMPTZ NOT NULL,
    L_SHIPINSTRUCT  TEXT        NOT NULL,
    L_SHIPMODE      TEXT        NOT NULL,
    L_COMMENT       TEXT        NOT NULL,
    PRIMARY KEY (L_ORDERKEY,L_LINENUMBER)
);
CALL set_table_property('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
CALL set_table_property('LINEITEM', 'segment_key', 'L_SHIPDATE');
CALL set_table_property('LINEITEM', 'distribution_key', 'L_ORDERKEY');
CALL set_table_property('LINEITEM', 'bitmap_columns', 'L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'dictionary_encoding_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT');
CALL set_table_property('LINEITEM', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS ORDERS;

BEGIN;
CREATE TABLE ORDERS
(
    O_ORDERKEY      INT         NOT NULL PRIMARY KEY,
    O_CUSTKEY       INT         NOT NULL,
    O_ORDERSTATUS   TEXT        NOT NULL,
    O_TOTALPRICE    DECIMAL(15,2) NOT NULL,
    O_ORDERDATE     timestamptz NOT NULL,
    O_ORDERPRIORITY TEXT        NOT NULL,
    O_CLERK         TEXT        NOT NULL,
    O_SHIPPRIORITY  INT         NOT NULL,
    O_COMMENT       TEXT        NOT NULL
);
CALL set_table_property('ORDERS', 'segment_key', 'O_ORDERDATE');
CALL set_table_property('ORDERS', 'distribution_key', 'O_ORDERKEY');
CALL set_table_property('ORDERS', 'colocate_with', 'LINEITEM');
CALL set_table_property('ORDERS', 'bitmap_columns', 'O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT');
CALL set_table_property('ORDERS', 'dictionary_encoding_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_COMMENT');
CALL set_table_property('ORDERS', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS PARTSUPP;

BEGIN;
CREATE TABLE PARTSUPP
(
    PS_PARTKEY    INT    NOT NULL,
    PS_SUPPKEY    INT    NOT NULL,
    PS_AVAILQTY   INT    NOT NULL,
    PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
    PS_COMMENT    TEXT   NOT NULL,
    PRIMARY KEY(PS_PARTKEY,PS_SUPPKEY)
);
CALL set_table_property('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
CALL set_table_property('PARTSUPP', 'colocate_with', 'LINEITEM');
CALL set_table_property('PARTSUPP', 'bitmap_columns', 'PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY,PS_COMMENT');
CALL set_table_property('PARTSUPP', 'dictionary_encoding_columns', 'PS_COMMENT');
CALL set_table_property('PARTSUPP', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS PART;

BEGIN;
CREATE TABLE PART
(
    P_PARTKEY     INT    NOT NULL PRIMARY KEY,
    P_NAME        TEXT   NOT NULL,
    P_MFGR        TEXT   NOT NULL,
    P_BRAND       TEXT   NOT NULL,
    P_TYPE        TEXT   NOT NULL,
    P_SIZE        INT    NOT NULL,
    P_CONTAINER   TEXT   NOT NULL,
    P_RETAILPRICE DECIMAL(15,2) NOT NULL,
    P_COMMENT     TEXT   NOT NULL
);
CALL set_table_property('PART', 'distribution_key', 'P_PARTKEY');
CALL set_table_property('PART', 'colocate_with', 'LINEITEM');
CALL set_table_property('PART', 'bitmap_columns', 'P_PARTKEY,P_SIZE,P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'dictionary_encoding_columns', 'P_NAME,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER,P_COMMENT');
CALL set_table_property('PART', 'time_to_live_in_seconds', '31536000');
COMMIT;



DROP TABLE IF EXISTS CUSTOMER;
BEGIN;
CREATE TABLE CUSTOMER
(
    C_CUSTKEY    INT    NOT NULL PRIMARY KEY,
    C_NAME       TEXT   NOT NULL,
    C_ADDRESS    TEXT   NOT NULL,
    C_NATIONKEY  INT    NOT NULL,
    C_PHONE      TEXT   NOT NULL,
    C_ACCTBAL    DECIMAL(15,2) NOT NULL,
    C_MKTSEGMENT TEXT   NOT NULL,
    C_COMMENT    TEXT   NOT NULL
);
CALL set_table_property('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
CALL set_table_property('CUSTOMER', 'colocate_with', 'LINEITEM');
CALL set_table_property('CUSTOMER', 'bitmap_columns', 'C_CUSTKEY,C_NATIONKEY,C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'dictionary_encoding_columns', 'C_NAME,C_ADDRESS,C_PHONE,C_MKTSEGMENT,C_COMMENT');
CALL set_table_property('CUSTOMER', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS SUPPLIER;

BEGIN;
CREATE TABLE SUPPLIER
(
    S_SUPPKEY   INT    NOT NULL PRIMARY KEY,
    S_NAME      TEXT   NOT NULL,
    S_ADDRESS   TEXT   NOT NULL,
    S_NATIONKEY INT    NOT NULL,
    S_PHONE     TEXT   NOT NULL,
    S_ACCTBAL   DECIMAL(15,2) NOT NULL,
    S_COMMENT   TEXT   NOT NULL
);
CALL set_table_property('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
CALL set_table_property('SUPPLIER', 'colocate_with', 'LINEITEM');
CALL set_table_property('SUPPLIER', 'bitmap_columns', 'S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'dictionary_encoding_columns', 'S_NAME,S_ADDRESS,S_PHONE,S_COMMENT');
CALL set_table_property('SUPPLIER', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS NATION;

BEGIN;
CREATE TABLE NATION(
  N_NATIONKEY INT NOT NULL PRIMARY KEY,
  N_NAME text NOT NULL,
  N_REGIONKEY INT NOT NULL,
  N_COMMENT text NOT NULL
);
CALL set_table_property('NATION', 'distribution_key', 'N_NATIONKEY');
CALL set_table_property('NATION', 'colocate_with', 'LINEITEM');
CALL set_table_property('NATION', 'bitmap_columns', 'N_NATIONKEY,N_NAME,N_REGIONKEY,N_COMMENT');
CALL set_table_property('NATION', 'dictionary_encoding_columns', 'N_NAME,N_COMMENT');
CALL set_table_property('NATION', 'time_to_live_in_seconds', '31536000');
COMMIT;

DROP TABLE IF EXISTS REGION;

BEGIN;
CREATE TABLE REGION
(
    R_REGIONKEY INT  NOT NULL PRIMARY KEY,
    R_NAME      TEXT NOT NULL,
    R_COMMENT   TEXT
);
CALL set_table_property('REGION', 'distribution_key', 'R_REGIONKEY');
CALL set_table_property('REGION', 'colocate_with', 'LINEITEM');
CALL set_table_property('REGION', 'bitmap_columns', 'R_REGIONKEY,R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'dictionary_encoding_columns', 'R_NAME,R_COMMENT');
CALL set_table_property('REGION', 'time_to_live_in_seconds', '31536000');
COMMIT;
  • 创建完毕后,您能在 psql 中使用如下代码查看是否创建成功
tpch_1sf=# \dt
  • 若成功,现实效果如下
tpch_1sf=# \dt
               List of relations
 Schema |   Name   | Type  |       Owner        
--------+----------+-------+--------------------
 public | customer | table | tpch_1sf_developer
 public | lineitem | table | tpch_1sf_developer
 public | nation   | table | tpch_1sf_developer
 public | orders   | table | tpch_1sf_developer
 public | part     | table | tpch_1sf_developer
 public | partsupp | table | tpch_1sf_developer
 public | region   | table | tpch_1sf_developer
 public | supplier | table | tpch_1sf_developer
(8 rows)

2. 导入数据

  • 本测试方案主要使用 COPY FROM STDIN 的方式导入数据详细可以参考官方文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创建的表中。
  • 您可以在数据生成工具的目录中参考如下 shell脚本导入数据
for i in `ls *.tbl`; do
    echo $i;
    name=`echo $i| cut -d'.' -f1`;
    PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;
done
  • 至此您已完成数据导入

3. 收集统计信息

  • 为了更好的执行查询,可以在 psql 中使用如下语句,使 Hologres 收集各张表特征信息。
vacuum  region;
vacuum  nation;
vacuum  supplier;
vacuum  customer;
vacuum  part;
vacuum  partsupp;
vacuum  orders;
vacuum  lineitem;

analyze nation;
analyze region;
analyze lineitem;
analyze orders;
analyze customer;
analyze part;
analyze partsupp;
analyze supplier;

执行查询

  • 为了方便统计查询信息,需要使用pgbench工具,您可以使用如下命令安装pgbench(如果测试机上已有pgbench,请确保版本大于9.6以上,最好大版本是13以上,否则以下测试会遇到各种不兼容)
yum install postgresql-contrib
  • 为了方便查询,您可以直接通过以下连接,下载所需的22条SQL

tpch_data_tpch_query.zip

  • 然后上传至ECS
  • 进入ECS,并进入上传文件的目录,使用如下shell命令解压缩文件
unzip tpch_data_tpch_query
  • 至此,您已经完成了准备工作,即可使用pgbench进行测试,您可以使用如下命令执行单条查询
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f xxx.sql
  • 参数解释
配置项 参数 说明
-h Hologres实例的endpoint 在Hologres管控台查看
-p Hologres实例的端口地址 在Hologres管控台查看
-d Hologres指定实例中的数据库名
-c 客户端数目(并发度) 示例:1,由于该测试仅测试查询性能,不测试并发,所以并发度置为1即可
-t 每个客户端需要执行的压测query数目 50
-f 压测的sql 示例:6.sql
  • 也可以直接执行如下 shell 脚本,直接批量执行22条查询,并将结果输出到文件hologres_tpch_test.out中
rm -f hologres_tpch_test.out
echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
for i in {1..22}
do
    PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f ./tpch_data_tpch_query/${i}.sql >> ./hologres_tpch_test.out
done
  • 查看hologres_tpch_test.out即可得到查询结果,样例如下

    • transaction type:说明了执行的具体的SQL文件
    • latency average:记录了对应SQL文件的3次查询的平均时间
2021-03-23 03:50:54 begin
pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
transaction type: ./tpch_data_tpch_query/1.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 3
number of transactions actually processed: 3/3
latency average = 76.936 ms
tps = 12.997850 (including connections establishing)
tps = 15.972757 (excluding connections establishing)
...

TPCH 22条查询语句

Q1

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '90' day 
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

Q2

select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 15
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and ps_supplycost = (
        select
            min(ps_supplycost)
        from
            partsupp,
            supplier,
            nation,
            region
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
    )
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
limit 100;

Q3

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
    c_mktsegment = 'BUILDING'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-15'
    and l_shipdate > date '1995-03-15'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate
limit 10;

Q4

select
    o_orderpriority,
    count(*) as order_count
from
    orders
where
    o_orderdate >= date '1993-07-01'
    and o_orderdate < date '1993-07-01' + interval '3' month
    and exists (
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;

Q5

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= date '1994-01-01'
    and o_orderdate < date '1994-01-01' + interval '1' year
group by
    n_name
order by
    revenue desc;

Q6

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < date '1994-01-01' + interval '1' year
    and l_discount between 6 - 1 and 6 + 1
    and l_quantity < 2400

Q7

set hg_experimental_enable_double_equivalent=on;
select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(year from l_shipdate) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and (
                (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
            )
            and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

Q8

set hg_experimental_enable_double_equivalent=on;
select
    o_year,
    sum(case
        when nation = 'BRAZIL' then volume
        else 0
    end) / sum(volume) as mkt_share
from
    (
        select
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) as volume,
            n2.n_name as nation
        from
            part,
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2,
            region
        where
            p_partkey = l_partkey
            and s_suppkey = l_suppkey
            and l_orderkey = o_orderkey
            and o_custkey = c_custkey
            and c_nationkey = n1.n_nationkey
            and n1.n_regionkey = r_regionkey
            and r_name = 'AMERICA'
            and s_nationkey = n2.n_nationkey
            and o_orderdate between date '1995-01-01' and date '1996-12-31'
            and p_type = 'STANDARD POLISHED TIN'
    ) as all_nations
group by
    o_year
order by
    o_year;

Q9

set hg_experimental_enable_double_equivalent=on;
select
    nation,
    o_year,
    sum(amount) as sum_profit
from
    (
        select
            n_name as nation,
            extract(year from o_orderdate) as o_year,
            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
        from
            part,
            supplier,
            lineitem,
            partsupp,
            orders,
            nation
        where
            s_suppkey = l_suppkey
            and ps_suppkey = l_suppkey
            and ps_partkey = l_partkey
            and p_partkey = l_partkey
            and o_orderkey = l_orderkey
            and s_nationkey = n_nationkey
            and p_name like '%green%'
    ) as profit
group by
    nation,
    o_year
order by
    nation,
    o_year desc;

Q10

select
    c_custkey,
    c_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
from
    customer,
    orders,
    lineitem,
    nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '1993-10-01'
    and o_orderdate < date '1993-10-01' + interval '3' month
    and l_returnflag = 'R'
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc
limit 20;

Q11

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'GERMANY'
group by
    ps_partkey having
        sum(ps_supplycost * ps_availqty) > (
            select
                sum(ps_supplycost * ps_availqty) * 0.0000010000
            from
                partsupp,
                supplier,
                nation
            where
                ps_suppkey = s_suppkey
                and s_nationkey = n_nationkey
                and n_name = 'GERMANY'
        )
order by
    value desc
    limit 100;

Q12

select
    l_shipmode,
    sum(case
        when o_orderpriority = '1-URGENT'
            or o_orderpriority = '2-HIGH'
            then 1
        else 0
    end) as high_line_count,
    sum(case
        when o_orderpriority <> '1-URGENT'
            and o_orderpriority <> '2-HIGH'
            then 1
        else 0
    end) as low_line_count
from
    orders,
    lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('MAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
group by
    l_shipmode
order by
    l_shipmode;

Q13

select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                c_custkey = o_custkey
                and o_comment not like '%special%requests%'
        group by
            c_custkey
    ) as c_orders (c_custkey, c_count)
group by
    c_count
order by
    custdist desc,
    c_count desc;

Q14

select
    100.00 * sum(case
        when p_type like 'PROMO%'
            then l_extendedprice * (1 - l_discount)
        else 0
    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-09-01'
    and l_shipdate < date '1995-09-01' + interval '1' month;

Q15

with revenue0(SUPPLIER_NO, TOTAL_REVENUE)  as
    (
    select
        l_suppkey,
        sum(l_extendedprice * (1 - l_discount))
    from
        lineitem
    where
        l_shipdate >= date '1995-12-01'
        and l_shipdate < date '1995-12-01' + interval '3' month
    group by
        l_suppkey
    )
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = (
        select
            max(total_revenue)
        from
            revenue0
    )
order by
    s_suppkey;

Q16

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45'
    and p_type not like 'MEDIUM POLISHED%'
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
    and ps_suppkey not in (
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

Q17

select
    sum(l_extendedprice) / 7.0 as avg_yearly
from
    lineitem,
    part
where
    p_partkey = l_partkey
    and p_brand = 'Brand#23'
    and p_container = 'MED BOX'
    and l_quantity < (
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
            and l_partkey in(
        select p_partkey 
        from part 
        where p_brand = 'Brand#23' and p_container = 'MED BOX')
    );

Q18

select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 300
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
limit 100;

Q19

select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    lineitem,
    part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#12'
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
        and l_quantity >= 1 and l_quantity <= 1 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 10 and l_quantity <= 10 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#34'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 20 and l_quantity <= 20 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

Q20

select
    s_name,
    s_address
from
    supplier,
    nation
where
    s_suppkey in (
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in (
                select
                    p_partkey
                from
                    part
                where
                    p_name like 'forest%'
            )
            and ps_availqty > (
                select
                    0.5 * sum(l_quantity)
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date '1994-01-01'
                    and l_shipdate < date '1994-01-01' + interval '1' year
            )
    )
    and s_nationkey = n_nationkey
    and n_name = 'CANADA'
order by
    s_name;

Q21

select
    s_name,
    count(*) as numwait
from
    supplier,
    lineitem l1,
    orders,
    nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists (
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists (
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'SAUDI ARABIA'
group by
    s_name
order by
    numwait desc,
    s_name
limit 100;

Q22

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from
    (
        select
            substring(c_phone from 1 for 2) as cntrycode,
            c_acctbal
        from
            customer
        where
            substring(c_phone from 1 for 2) in
                ('13', '31', '23', '29', '30', '18', '17')
            and c_acctbal > (
                select
                    avg(c_acctbal)
                from
                    customer
                where
                    c_acctbal > 0.00
                    and substring(c_phone from 1 for 2) in
                        ('13', '31', '23', '29', '30', '18', '17')
            )
            and not exists (
                select
                    *
                from
                    orders
                where
                    o_custkey = c_custkey
            )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

Key/Value点查场景测试

准备数据

1. 创建表

  • 继续使用OLAP查询场景创建的数据库,我们会使用TPCH数据集中的orders表进行测试,使用psql连接Hologres后,您可以使用如下建表语句创建数据库表;
注意:点查场景需要使用行存表,所以需要创建一张新表,不能使用OLAP查询场景中使用的表
DROP TABLE IF EXISTS orders_row;

BEGIN;
CREATE TABLE public.orders_row (
 "o_orderkey" int8 NOT NULL,
 "o_custkey" int8,
 "o_orderstatus" bpchar(1),
 "o_totalprice" numeric(15,2),
 "o_orderdate" date,
 "o_orderpriority" bpchar(15),
 "o_clerk" bpchar(15),
 "o_shippriority" int8,
 "o_comment" varchar(79),
PRIMARY KEY (o_orderkey)
);
CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
CALL SET_TABLE_PROPERTY('public.orders_row', 'time_to_live_in_seconds', '3153600000');
CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
COMMIT;

2. COPY方式导入数据

  • 本测试方案主要使用 COPY FROM STDIN 的方式导入数据详细可以参考官方文档。此处会将此前生成的 tbl 数据文件导入 Hologres 中创建的表中。
  • 您可以在数据生成工具的目录中参考如下命令导入数据
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl

3. INSERT INTO方式导入数据

  • 由于OLAP场景时您已经导入了orders表的数据,您可以运行如下SQL语句导入数据
INSERT INTO public.orders_row
SELECT  *
FROM    public.orders;

查询

1. 生成查询语句

  • Key/Value点查场景主要的查询语句特征如下
SELECT  column_a
        ,column_b
        ,...
        ,column_x
FROM    table_x
WHERE   pk = value_x
;

SELECT  column_a
        ,column_b
        ,...
        ,column_x
FROM    table_x
WHERE   pk IN ( value_a, value_b,..., value_x )
;
  • 您可以使用如下脚本生成所需的sql,该脚本会生成2条sql

    • kv_query_single.sql 针对单值筛选的SQL
    • kv_query_in.sql 针对多值筛选的SQL,该脚本会随机生成一个针对10个值筛选的SQL
rm -rf kv_query
mkdir kv_query
cd kv_query
echo '\set column_values random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;' >> kv_query_single.sql
echo '\set column_values1 random(1,99999999)
\set column_values2 random(1,99999999)
\set column_values3 random(1,99999999)
\set column_values4 random(1,99999999)
\set column_values5 random(1,99999999)
\set column_values6 random(1,99999999)
\set column_values7 random(1,99999999)
\set column_values8 random(1,99999999)
\set column_values9 random(1,99999999)
\set column_values10 random(1,99999999)
select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9,:column_values10);' >> kv_query_in.sql

2. 进行查询

  • 查询需要使用pgbench,您可以使用如下命令安装pgbench
yum install postgresql-contrib
  • 之后您即可使用pgbench进行压测,针对单值筛选的场景
注意,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_single.sql
  • 针对多值筛选的场景
注意,请在生成SQL的目录执行如下命令
PGUSER=<AccessID> PGPASSWORD=<AccessKey> pgbench -h <Endpoint> -p <Port> -d <Database> -c <Client_Num> -t <Query_Num> -n -f kv_query_in.sql
  • 参数解释
配置项 参数 说明
-h Hologres实例的endpoint 在Hologres管控台查看
-p Hologres实例的端口地址 在Hologres管控台查看
-d Hologres指定实例中的数据库名
-c 客户端数目(并发度) 示例:8
-t 每个客户端需要执行的压测query数目 50
-f 压测的sql 示例:6.sql

测试结果参考

测试数据量:

  • 本测试基于TPCH 100G的数据集进行测试,具体数据量如下表所示
表名 行数
LINEITEM 600,037,902
ORDERS 150,000,000
PARTSUPP 80,000,000
PART 20,000,000
CUSTOMER 15,000,000
SUPPLIER 1,000,000
NATION 25
REGION 5

集群规格

计算资源 存储容量 软件版本 备注
64 CU
(CPU:64 Core 内存:256 GB) 100 GB r0.10.20 使用集群默认配置,Shard数量:40
128 CU
(CPU:128 Core 内存:512 GB) 100 GB r0.10.20 使用集群默认配置,Shard数量:80
测试时间:2021年6月

测试结果

数据导入时间

  • 数据导入执行时间以秒(s)为单位。
  • 导入时间指将数据导入Hologres内表
  • 在使用COPY方法导入数据时,一张表对应一个数据文件,并未使用并发导入方式。
  • 具体数值如下表所示
说明:使用COPY方式导入时一张表对应一个数据文件,并未使用并发导入方式
表名 行数 数据量 Hologres 64CU
使用COPY方式导入(公网网络) 使用COPY方式导入(VPC网络导入) 使用MaxCompute外表导入
LINEITEM 600,037,902 73.6GB 3,070.453 694.364 148.165
ORDERS 150,000,000 16.4GB 691.060 172.529 37.741
PARTSUPP 80,000,000 2.3GB 468.560 107.092 18.488
PART 20,000,000 11.3GB 96.342 24.020 8.083
CUSTOMER 15,000,000 2.3GB 95.190 22.937 10.363
SUPPLIER 1,000,000 132MB 5.057 1.803 1.503
NATION 25 2KB 0.580 0.584 0.747
REGION 5 0.375KB 0.168 0.153 0.430
Total 106G 4427.410 1023.482 225.52
  • 下图中蓝色为使用COPY方式在公网条件下导入数据的时间,绿色为使用COPY方式在VPC网络条件下导入数据的时间,灰色为使用MaxCompute外表方式导入的时间
  • 纵坐标数值越低,表示导入速度越快
  • 横轴:表名。纵轴:数据导入时间(s)

tpch1.png

  • 可以看出,由于网络带宽影响,使用COPY方式导入本地文件数据时,使用VPC网络导入数据时间明显短于使用公网导入数据时间;使用MaxCompute导入数据时间明显短于使用COPY方式导入本地文件数据时间。

查询时间

  • 查询执行时间以秒(s)为单位。
  • 查询结果均基于Hologres内表
  • 具体数值如下表所示
TPCH Query编号 Hologres 64CU Hologres 128CU
1 3.120 2.150
2 0.581 0.467
3 1.735 1.005
4 1.558 0.836
5 2.921 1.917
6 0.297 0.096
7 2.006 1.029
8 2.674 1.679
9 5.298 2.796
10 1.944 0.924
11 0.397 0.297
12 1.531 0.852
13 1.741 0.971
14 0.286 0.160
15 0.293 0.177
16 1.223 1.020
17 1.405 0.607
18 3.817 2.169
19 1.400 0.622
20 1.358 0.868
21 4.164 2.047
22 1.121 0.654
Total 40.870 23.343
  • 下图中蓝色为64CU的实例的查询结果,绿色为128CU实例的查询结果
  • 纵坐标数值越低,表示 TPC-H 性能越好。
  • 可以看出随着实例规模的成本增长,查询时间也在成线性下降趋势
  • 横轴:query在文档中的编号。纵轴:query执行时间(s)

tpch2.png

相关实践学习
基于Hologres+PAI+计算巢,5分钟搭建企业级AI问答知识库
本场景采用阿里云人工智能平台PAI、Hologres向量计算和计算巢,搭建企业级AI问答知识库。通过本教程的操作,5分钟即可拉起大模型(PAI)、向量计算(Hologres)与WebUI资源,可直接进行对话问答。
相关文章
|
存储 监控 OLAP
【ClickHouse 技术系列】- 在 ClickHouse 物化视图中使用 Join
本文翻译自 Altinity 针对 ClickHouse 的系列技术文章。面向联机分析处理(OLAP)的开源分析引擎 ClickHouse,因其优良的查询性能,PB级的数据规模,简单的架构,被国内外公司广泛采用。本系列技术文章,将详细展开介绍 ClickHouse。
【ClickHouse 技术系列】- 在 ClickHouse 物化视图中使用 Join
|
1月前
|
SQL 分布式计算 测试技术
hudi性能测试
hudi性能测试
36 0
|
4月前
|
SQL 缓存 运维
常用ClickHouse问题诊断查询
Clickhouse是一个性能强大的OLAP数据库,在实际使用中会遇到各种各样的问题,同时也有很多可以调优的地方。诊断调优所用到的SQL查询必不可少。本文就是一个ClickHouse日常运维的常用SQL查询手册。这个手册本人就在用,非常实用。
73985 48
|
SQL 运维 关系型数据库
分库分表至 Hologres 最佳实践 | 学习笔记
快速学习分库分表至 Hologres 最佳实践
125 0
|
SQL 存储 人工智能
Hologres新手如何基于ChatGPT优化Hologres慢SQL
Hologres新手如何基于ChatGPT优化Hologres慢SQL。
1700 1
Hologres新手如何基于ChatGPT优化Hologres慢SQL
|
SQL 存储 并行计算
ClickHouse性能测试
ClickHouse性能测试
241 0
|
存储 SQL 关系型数据库
记一次 ClickHouse 性能测试
本文将对比 MySQL、InfluxDB、ClickHouse 在千万数据量下的写入耗时、聚合查询耗时、磁盘占用等各方面性能指标,并详细记录了测试全过程。
492 0
记一次 ClickHouse 性能测试
|
存储 SQL 分布式计算
Tablestore Adhoc 分析性能测试白皮书
本文主要介绍 Tablestore 在 Adhoc 分析场景下的性能测试,主要包括测试环境、测试工具、测试方案、测试结果以及测试结论等。
503 0
Tablestore Adhoc 分析性能测试白皮书
|
SQL 移动开发 缓存
AnalyticDB for PostgreSQL TPC-B性能测试
1 前言 AnalyticDB for PostgreSQL 6.0 (简称ADBPG6.0)在事务处理性能上相比上个版本ADBPG4.3有了质的飞跃,本文将以TPC-B业界标准事务性能测试benchmark来展示ADBPG6.0在事务上的处理能力。
1723 0
AnalyticDB for PostgreSQL TPC-B性能测试
|
SQL 弹性计算 固态存储
如何使用AnalyticDB for PostgreSQL 6.0 进行TPC-H 1TB数据的测试
TPC-H 主要目的是评价特定查询的决策支持能力,该基准模拟了决策支持系统中的数据库操作,测试数据库系统复杂查询的响应时间。本文详细介绍了在阿里云AnalyticDB for PostgreSQL 6.0版,测试TPC-H benchmark 1TB数据的流程和测试结果
5943 0
如何使用AnalyticDB for PostgreSQL 6.0 进行TPC-H 1TB数据的测试