通过 PolarDB for PostgreSQL 实现一体化的 HTAP 能力

本文涉及的产品
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS SQL Server,基础系列 2核4GB
云原生数据库 PolarDB 分布式版,标准版 2核8GB
简介: 阿里云 PolarDB for PostgreSQL作为一款领先的云原生关系型数据库,利用向量化引擎+列存索引等技术实现了 OLTP 和 OLAP 的一体化。本方案为您展示如何通过 PolarDB for PostgreSQL 来实现一体化的 HTAP 能力。

背景


在当今快速发展的数字化时代,企业对实时数据分析和高效事务处理的需求日益增长。传统的数据库架构往往将在线交易处理(OLTP)与在线分析处理(OLAP)分离,导致系统复杂性增加、维护成本高昂,并且难以满足实时决策支持的要求。随着技术的进步,混合事务与分析处理(HTAP)逐渐成为主流趋势,它允许在同一数据库实例内同时进行高效的交易处理和复杂的数据分析。

阿里云 PolarDB for PostgreSQL作为一款领先的云原生关系型数据库,在 PostgreSQL 的高性能 OLTP 基础上,通过向量化引擎+列存索引等技术实现了 OLTP 和 OLAP 的一体化。在一个数据库实例内,仅仅需要为表构建一个列存索引,用户即可享受到高并发事务处理的卓越性能,以及即时获取最新业务数据的深度洞察,大大缩短了从数据到价值的时间周期。

本方案将为您展示如何通过 PolarDB for PostgreSQL 来实现一体化的 HTAP 能力。

更多文档参考:什么是向量化引擎?_云原生数据库 PolarDB(PolarDB)-阿里云帮助中心


TPC-H 性能测试


TPC-H是业界常用的性能标准测试,由TPC委员会制定发布,用于评测数据库的分析能力。包含8张数据表、22条复杂的SQL查询,包含单表统计、多表 Join、子查询、聚合、排序等。

本次测试数据量:10GB。

测试环境:

  • ECS:规格为 2C2GB(ecs.e-c1m1.large),挂载的磁盘 40GB
  • PolarDB PostgreSQL:
  • 计费类型:按量付费
  • 引擎版本:PostgreSQL 14
  • 规格为 4C 16GB(标准版,通用规格,X86)
  • 节点个数:1 个读写节点,0 个只读节点

注:ECS实例和PolarDB集群需保证在同一个VPC中。


准备工作

开通 ECS

按照要求选择规格以及配置    开通入口


开通 PolarDB PostgreSQL


image.png


修改 PolarDB PostgreSQL 配置参数



添加白名单

在左侧导航栏中,选择配置与管理 > 集群白名单。在集群白名单页面,单击default分组右侧的配置

配置白名单面板中,将白名单内IP地址区域的IP修改为ECS实例的私网IP地址。


测试步骤

  • 登录到 ECS 服务器,下载 TPC-H 工具:dbgen.tar.gz,解压后进入到目录编译 dbgen,并生成 10GB 的测试数据。约 5 分钟。
mkdir /data
cd /data
## 下载dbgen工具包,并编译
wget https://ganos-hz.oss-cn-hangzhou.aliyuncs.com/csi/dbgen.tar.gz
tar -zxvf dbgen.tar.gz
cd dbgen
make -f makefile.suite
## 生成10GB的测试数据
./dbgen -s 10 -f


  • 在 ECS 上通过 psql 连接到 PolarDB PostgreSQL 实例。
## 下载polardb tools工具箱,里面包含psql命令
sudo yum install https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241113/phrrjc/PolarDB-Tools-2.0.14.26.0-20241023104506.al8.x86_64.rpm
## 执行psql命令,连接到数据库
## {}是需要根据实际连接信息来替换
/u01/polardb_pg_tools/bin/psql -h {host} -p {port} -U{user} postgres


其中:

host 和 port 的获取方式如下,请将{host}替换为实际的域名,请将{port}替换为实际的端口。


{user}替换为一个新创建的高权限账号


  • 连接到数据库后,在 psql 中执行如下命令, 导入生成的 TPC-H 数据。 约 10 分钟
--创建tpchdb
CREATE DATABASE tpchdb
WITH 
    ENCODING = 'UTF8'
    LC_COLLATE = 'C'
    LC_CTYPE = 'C'
    TEMPLATE template0;
--切换到tpchdb
\c tpchdb
--在 tpchdb 中创建 polar_csi 插件
CREATE EXTENSION polar_csi;
--执行dbgen目录中的dss.ddl文件,创建tpch 8张表结构,这里是相对路径,也可以指定绝对路径
\i ./dss.ddl
--将dbgen目录中生成的8张表(.tbl文件)的数据导入到tpchdb中,这里是相对路径,也可以指定绝对路径
\copy part from ./part.tbl with delimiter as '|' NULL '';
\copy region from ./region.tbl with delimiter as '|' NULL '';
\copy nation from ./nation.tbl with delimiter as '|' NULL '';
\copy orders from ./orders.tbl with delimiter as '|' NULL '';
\copy customer from ./customer.tbl with delimiter as '|' NULL '';
\copy lineitem from ./lineitem.tbl with delimiter as '|' NULL '';
\copy partsupp from ./partsupp.tbl with delimiter as '|' NULL '';
\copy supplier from ./supplier.tbl with delimiter as '|' NULL '';


  • 设置向量化引擎和列存索引相关的参数,用于后续查询
SET polar_csi.enable_pk to on;
set polar_csi.enable_query to on;
set polar_csi.exec_parallel to 4;
set polar_csi.cost_threshold = 0;
set polar_csi.memory_limit = 2048;


  • 数据导入后,为每张表创建列存索引。约 10 分钟
--设置参数,加快导入速度
SET polar_csi.enable_pk to on;
set polar_csi.memory_limit = 2048;

--为partsupp表创建列存索引
ALTER TABLE PARTSUPP ADD CONSTRAINT partsupp_pkey PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY);
CREATE INDEX imps ON partsupp USING csi(ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment);

--为part表创建列存索引
ALTER TABLE PART ADD CONSTRAINT part_kpey PRIMARY KEY (P_PARTKEY);
CREATE INDEX im_p ON part     USING csi(p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment);

--为supplier表创建列存索引
ALTER TABLE SUPPLIER ADD CONSTRAINT supplier_pkey PRIMARY KEY (S_SUPPKEY);
CREATE INDEX im_s ON supplier USING csi(s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment);

--为customer表创建列存索引
ALTER TABLE CUSTOMER ADD CONSTRAINT customer_pkey PRIMARY KEY (C_CUSTKEY);
CREATE INDEX im_c ON customer USING csi(c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment);

--为orders表创建列存索引
ALTER TABLE ORDERS ADD CONSTRAINT orders_pkey PRIMARY KEY (O_ORDERKEY);
CREATE INDEX im_o ON orders   USING csi(o_orderkey, o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk, o_shippriority, o_comment);

--为lineitem表创建列存索引
ALTER TABLE LINEITEM ADD CONSTRAINT lineitem_pkey PRIMARY KEY (L_ORDERKEY, L_LINENUMBER);
CREATE INDEX im_l ON lineitem USING csi(l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment);

--为nation表创建列存索引
ALTER TABLE NATION ADD CONSTRAINT nation_pkey PRIMARY KEY (N_NATIONKEY);
CREATE INDEX im_n ON nation   USING csi(n_nationkey, n_name, n_regionkey, n_comment);

--为region表创建列存索引
ALTER TABLE REGION ADD CONSTRAINT region_pkey PRIMARY KEY (R_REGIONKEY);
CREATE INDEX im_r ON region   USING csi(r_regionkey, r_name, r_comment);


  • 退出 psql
\q


  • 开启向量化引擎,执行 22 条 SQL 语句统计耗时。具体步骤为:
-- 下载已经准备好的22条sql文件(参数配置已经在文件中设置完毕)
wget https://ganos-hz.oss-cn-hangzhou.aliyuncs.com/csi/tpch_csi.sql

-- 通过psql命令,执行tpch_csi.sql文件,并将结果写入到tpch_csi.result文件中
-- {}是需要根据实际连接信息来替换
-- 该步骤耗时约20秒左右,请耐心等待
/u01/polardb_pg_tools/bin/psql -h {host} -p {port} -U{user} tpchdb -f tpch_csi.sql > tpch_csi.result

-- 查看tpch_csi.result文件内容
cat tpch_csi.result


  • 关闭向量化引擎,使用 PG 的行存引擎,执行 22 条语句统计耗时。具体步骤为:

注:PG 的行存引擎有几条会耗费较长时间,因此设置了超时时间,statement_timeout 为 2 分钟,超过 2 分钟的自动终止

-- 下载已经准备好的22条sql文件(参数配置已经在文件中设置完毕)
wget https://ganos-hz.oss-cn-hangzhou.aliyuncs.com/csi/tpch_pg.sql

-- 通过psql命令,执行tpch_pg.sql文件,并将结果写入到tpch_pg.result文件中
-- {}是需要根据实际连接信息来替换
-- 该步骤耗时约20分钟以上,请耐心等待
/u01/polardb_pg_tools/bin/psql -h {host} -p {port} -U{user} tpchdb -f tpch_pg.sql > tpch_pg.result

-- 查看tpch_pg.result文件内容
cat tpch_pg.result


  • :22 条 SQL 语句,如果不使用上面的 sql 文件,也可以自行手动复制这些 sql 语句来执行
--如果要使用向量化引擎,则需要设置如下参数
set polar_csi.enable_query to on;
set polar_csi.exec_parallel to 4;
set polar_csi.cost_threshold = 0;
set polar_csi.memory_limit = 2048;
--如果要使用PG原生行存引擎,则需要设置如下参数
set polar_csi.enable_query to off;
set max_parallel_workers_per_gather=4;
set max_parallel_workers=4;
--以下为22条执行语句
--Q1
select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where l_shipdate <= date '1998-12-01' - '60 day'::interval
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;
--Q2
select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
from
    part,
    supplier,
    partsupp,
    nation,
    region
where p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 43
    and p_type like '%NICKEL'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'MIDDLE EAST'
    and ps_supplycost = (
    select
    min(ps_supplycost)
    from
    partsupp,
    supplier,
    nation,
    region
    where p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'MIDDLE EAST'
    )
order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey limit 100;
--Q3
select l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where c_mktsegment = 'FURNITURE'
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-05'
    and l_shipdate > date '1995-03-05'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate limit 100;
--Q4
select
    o_orderpriority,
    count(*) as order_count
from
    orders
where o_orderdate >= date '1993-05-01'
    and o_orderdate < date '1993-05-01' + interval '3 month'::interval
    and exists (
    select
    *
    from
    lineitem
    where l_orderkey = o_orderkey
    and l_commitdate < l_receiptdate
    )
group by
    o_orderpriority
order by
    o_orderpriority;
--Q5
select
        n_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue
from
        customer,
        orders,
        lineitem,
        supplier,
        nation,
        region
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and l_suppkey = s_suppkey
        and c_nationkey = s_nationkey
        and s_nationkey = n_nationkey
        and n_regionkey = r_regionkey
        and r_name = 'MIDDLE EAST'
        and o_orderdate >= date '1995-01-01'
        and o_orderdate < date '1995-01-01' + '1 year'::interval
group by
        n_name
order by
        revenue desc;
--Q6
select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1993-01-01'
        and l_shipdate < date '1993-01-01' + '1 year'::interval
        and l_discount between 0.03 - 0.01 and 0.03 + 0.01
        and l_quantity < 24;
--Q7
  SELECT
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) AS revenue
FROM (
    SELECT
        n1.n_name AS supp_nation,
        n2.n_name AS cust_nation,
        extract(year FROM l_shipdate) AS l_year,
        l_extendedprice * (1 - l_discount) AS volume
    FROM
        supplier,
        lineitem,
        orders,
        customer,
        nation n1,
        nation n2
    WHERE
        s_suppkey = l_suppkey
        AND o_orderkey = l_orderkey
        AND c_custkey = o_custkey
        AND s_nationkey = n1.n_nationkey
        AND c_nationkey = n2.n_nationkey
        AND ((n1.n_name = 'FRANCE'
                AND n2.n_name = 'GERMANY')
            OR (n1.n_name = 'GERMANY'
                AND n2.n_name = 'FRANCE'))
        AND l_shipdate BETWEEN CAST('1995-01-01' AS date)
        AND CAST('1996-12-31' AS date)) AS shipping
GROUP BY
    supp_nation,
    cust_nation,
    l_year
ORDER BY
    supp_nation,
    cust_nation,
    l_year;
--Q8
select
        o_year,
        sum(case
                when nation = 'INDONESIA' then volume
                else 0
        end) / sum(volume) as mkt_share
from
        (
                select
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) as volume,
                        n2.n_name as nation
                from
                        part,
                        supplier,
                        lineitem,
                        orders,
                        customer,
                        nation n1,
                        nation n2,
                        region
                where
                        p_partkey = l_partkey
                        and s_suppkey = l_suppkey
                        and l_orderkey = o_orderkey
                        and o_custkey = c_custkey
                        and c_nationkey = n1.n_nationkey
                        and n1.n_regionkey = r_regionkey
                        and r_name = 'ASIA'
                        and s_nationkey = n2.n_nationkey
                        and o_orderdate between '1995-01-01'::date and '1996-12-31'::date
                        and p_type = 'PROMO POLISHED NICKEL'
        ) as all_nations
group by
        o_year
order by
        o_year;
--Q9
select
        nation,
        o_year,
        sum(amount) as sum_profit
from
        (
                select
                        n_name as nation,
                        extract(year from o_orderdate) as o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
                from
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                where
                        s_suppkey = l_suppkey
                        and ps_suppkey = l_suppkey
                        and ps_partkey = l_partkey
                        and p_partkey = l_partkey
                        and o_orderkey = l_orderkey
                        and s_nationkey = n_nationkey
                        and p_name like '%navajo%'
        ) as profit
group by
        nation,
        o_year
order by
        nation,
        o_year desc limit 100;
--Q10
select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        customer,
        orders,
        lineitem,
        nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= date '1993-08-01'
        and o_orderdate < date '1993-08-01' + '3 month'::interval
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
        LIMIT 20;
--Q11
select
        ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
from
        partsupp,
        supplier,
        nation
where
        ps_suppkey = s_suppkey
        and s_nationkey = n_nationkey
        and n_name = 'ALGERIA'
group by
        ps_partkey having
                sum(ps_supplycost * ps_availqty) > (
                        select
                                sum(ps_supplycost * ps_availqty) * 0.0001000000
                        from
                                partsupp,
                                supplier,
                                nation
                        where
                                ps_suppkey = s_suppkey
                                and s_nationkey = n_nationkey
                                and n_name = 'ALGERIA'
                )
order by
        value desc;
--Q12
select
        l_shipmode,
        sum(case
                when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                else 0
        end) as high_line_count,
        sum(case
                when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                else 0
        end) as low_line_count
from
        orders,
        lineitem
where
        o_orderkey = l_orderkey
        and l_shipmode in ('AIR', 'FOB')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1996-01-01'
        and l_receiptdate < date '1996-01-01' + '1 year'::interval
group by
        l_shipmode
order by
        l_shipmode;
--Q13
select
        c_count,
        count(*) as custdist
from
        (
                select
                        c_custkey,
                        count(o_orderkey)
                from
                        customer left outer join orders on
                                c_custkey = o_custkey
                                and o_comment not like '%pending%requests%'
                group by
                        c_custkey
        ) as c_orders (c_custkey, c_count)
group by
        c_count
order by
        custdist desc,
        c_count desc;
--Q14
select
        100.00 * sum(case
                when p_type like 'PROMO%'
                        then l_extendedprice * (1 - l_discount)
                else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
        lineitem,
        part
where
        l_partkey = p_partkey
        and l_shipdate >= date '1995-02-01'
        and l_shipdate < date '1995-02-01' + interval '1 month'::interval;
--Q15
with revenue0 as (
    select
    l_suppkey as supplier_no,
    sum(l_extendedprice * (1 - l_discount)) as total_revenue
    from
    lineitem
    where l_shipdate >= date '1995-08-01'
    and l_shipdate < date '1995-08-01' + '3 month'::interval
    group by
    l_suppkey)
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue0
where
    s_suppkey = supplier_no
    and total_revenue = (
    select
    max(total_revenue)
    from
    revenue0
    )
order by
    s_suppkey;
--Q16
select
        p_brand,
        p_type,
        p_size,
        count(distinct ps_suppkey) as supplier_cnt
from
        partsupp,
        part
where
        p_partkey = ps_partkey
        and p_brand <> 'Brand#13'
        and p_type not like 'ECONOMY BRUSHED%'
        and p_size in (11, 8, 10, 31, 21, 13, 32, 28)
        and ps_suppkey not in (
                select
                        s_suppkey
                from
                        supplier
                where
                        s_comment like '%Customer%Complaints%'
        )
group by
        p_brand,
        p_type,
        p_size
order by
        supplier_cnt desc,
        p_brand,
        p_type,
        p_size limit 100;
--Q17
select
        sum(l_extendedprice) / 7.0 as avg_yearly
from
        lineitem,
        part
where
        p_partkey = l_partkey
        and p_brand = 'Brand#44'
        and p_container = 'MED PKG'
        and l_quantity < (
                select
                        0.2 * avg(l_quantity)
                from
                        lineitem
                where
                        l_partkey = p_partkey
        );
--Q18
select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
from
        customer,
        orders,
        lineitem
where
        o_orderkey in (
                select
                        l_orderkey
                from
                        lineitem
                group by
                        l_orderkey having
                                sum(l_quantity) > 313
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
order by
        o_totalprice desc,
        o_orderdate limit 100;
--Q19
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#15'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 10 and l_quantity <= 10 + 10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#45'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 18 and l_quantity <= 18 + 10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#21'
                and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                and l_quantity >= 30 and l_quantity <= 30 + 10
                and p_size between 1 and 15
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        );
--Q20
select
        s_name,
        s_address
from
        supplier,
        nation
where
        s_suppkey in (
                select
                        ps_suppkey
                from
                        partsupp
                where
                        ps_partkey in (
                                select
                                        p_partkey
                                from
                                        part
                                where
                                        p_name like 'lemon%'
                        )
                        and ps_availqty > (
                                select
                                        0.5 * sum(l_quantity)
                                from
                                        lineitem
                                where
                                        l_partkey = ps_partkey
                                        and l_suppkey = ps_suppkey
                                        and l_shipdate >= date '1997-01-01'
                                        and l_shipdate < date '1997-01-01' + '1 year'::interval
                        )
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDONESIA'
order by
        s_name limit 100;
--Q21
select
        s_name,
        count(*) as numwait
from
        supplier,
        lineitem l1,
        orders,
        nation
where
        s_suppkey = l1.l_suppkey
        and o_orderkey = l1.l_orderkey
        and o_orderstatus = 'F'
        and l1.l_receiptdate > l1.l_commitdate
        and exists (
                select
                        *
                from
                        lineitem l2
                where
                        l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey
        )
        and not exists (
                select
                        *
                from
                        lineitem l3
                where
                        l3.l_orderkey = l1.l_orderkey
                        and l3.l_suppkey <> l1.l_suppkey
                        and l3.l_receiptdate > l3.l_commitdate
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDIA'
group by
        s_name
order by
        numwait desc,
        s_name limit 100;
--Q22
select
        cntrycode,
        count(*) as numcust,
        sum(c_acctbal) as totacctbal
from
        (
                select
                        substring(c_phone from 1 for 2) as cntrycode,
                        c_acctbal
                from
                        customer
                where
                        substring(c_phone from 1 for 2) in
                                ('16', '17', '24', '21', '19', '22', '15')
                        and c_acctbal > (
                                select
                                        avg(c_acctbal)
                                from
                                        customer
                                where
                                        c_acctbal > 0.00
                                        and substring(c_phone from 1 for 2) in
                                                ('16', '17', '24', '21', '19', '22', '15')
                        )
                        and not exists (
                                select
                                        *
                                from
                                        orders
                                where
                                        o_custkey = c_custkey
                        )
        ) as custsale
group by
        cntrycode
order by
        cntrycode;


结果

列存索引占用空间

对比向量化引擎列存索引和行存引擎 Heap 表的空间占用情况。

测试结论:

  • 行存引擎 Heap 表占用空间为 12.7GB。
  • 列存索引占用空间为 5.5GB


表名称

表中包含的数据行数

PostgreSQL 行存

向量化引擎的列存索引

(包含主键)

LINEITEM

599,860,52

8792 MB

3678 MB

ORDERS

150,000,00

2044 MB

914 MB

PARTSUPP

80,000,00

1372 MB

787 MB

PART

20,000,00

324 MB

66 MB

CUSTOMER

15,000,00

284 MB

118 MB

SUPPLIER

1,000,00

20 MB

8 MB

NATION

25

8 KB

528 KB

REGION

5

8 KB

528 KB

合计

12.7GB

5.5GB


查询性能

对比 PostgreSQL 行存引擎与向量化引擎查询效率。

测试结果:向量化引擎性能是 PG 行存引擎性能的 60 倍以上。


注:请将下方Q1-Q22任务全部完成,填写表后,并截图

查询语句

PolarDB PG 向量化引擎耗时

(单位:毫秒)

PostgreSQL 行存引擎耗时

(单位:毫秒)

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q11

Q12

Q13

Q14

Q15

Q16

Q17

Q18

Q19

Q20

Q21

Q22

总耗时:


相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
打赏
0
9
8
0
203
分享
相关文章
【PolarDB开源】PolarDB与云原生数据库比较:特点、优势与选型建议
【5月更文挑战第26天】PolarDB是阿里云的云原生数据库,以其计算存储分离、一写多读架构和数据一致性保障脱颖而出。与Amazon Aurora和Google Cloud Spanner相比,PolarDB在中国市场更具优势,适合读多写少的场景和需要严格数据一致性的应用。企业在选型时应考虑业务需求、地域、读写比例和兼容性。PolarDB作为优秀解决方案,将在云原生数据库领域持续发挥关键作用。
584 1
PolarDB有哪些优势?
【5月更文挑战第12天】PolarDB有哪些优势?
342 3
【PolarDB 开源】PolarDB 在大数据分析中的应用:海量数据处理方案
【5月更文挑战第25天】PolarDB是解决大数据挑战的关键技术,以其高性能和可扩展性处理大规模数据。通过与数据采集和分析工具集成,构建高效数据生态系统。示例代码显示了PolarDB如何用于查询海量数据。优化策略包括数据分区、索引、压缩和分布式部署,广泛应用于电商、金融等领域,助力企业进行精准分析和决策。随着大数据技术进步,PolarDB将继续发挥关键作用,创造更多价值。
322 0
PolarDB架构
PolarDB分布式版产品架构
495 2
体验高可用云原生PolarDB MySQL引擎
本实验主要介绍如何在一台CentOS 7操作系统的ECS实例上,通过SysBench测试工具,体验云原生PolarDB MySQL引擎的高可用特性。

相关产品

  • 云原生数据库 PolarDB