数据库开发者社区 > 正文

Greenplum ORCA 优化器的编译安装与使用

简介: 背景 ORCA 是PostgreSQL的下一代优化器,在QUERY的优化上比自带的优化器有长足的进步。 安装ORCA cmake wget https://cmake.org/files/v3.5/cmake-3.5.2.tar.gz tar -zxvf cmake-3.5.2.tar
+关注继续查看

背景

ORCA 是PostgreSQL的下一代优化器,在QUERY的优化上比自带的优化器有长足的进步。

https://github.com/greenplum-db/gporca

安装ORCA

cmake

wget https://cmake.org/files/v3.5/cmake-3.5.2.tar.gz
tar -zxvf cmake-3.5.2.tar.gz
cd cmake-3.5.2
./configure --prefix=/home/digoal/cmake
make
make install
export PATH=/home/digoal/cmake/bin:$PATH

GPOS

cd ~
git clone https://github.com/greenplum-db/gpos
cd gpos
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/home/digoal/gpos_home ../
make install

gp-xerces

cd ~
git clone https://github.com/greenplum-db/gp-xerces
cd gp-xerces
mkdir build
cd build
../configure --prefix=/home/digoal/gp-xerces_home
make -j 32
make install

gporca

cd ~
git clone https://github.com/greenplum-db/gporca.git
cd gporca
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/home/digoal/gporca_home \
-D GPOS_INCLUDE_DIR=/home/digoal/gpos_home/include \
-D GPOS_LIBRARY=/home/digoal/gpos_home/lib/libgpos.so \
-D XERCES_INCLUDE_DIR=/home/digoal/gp-xerces_home/include \
-D XERCES_LIBRARY=/home/digoal/gp-xerces_home/lib/libxerces-c.so ../
make -j 32
make install

greenplum

greenplum的源码安装与集群初始化参考
https://yq.aliyun.com/articles/180

源码安装部分修改为

$ git clone https://github.com/greenplum-db/gpdb.git
$ cd gpdb

cp -r ~/gporca_home/include/gpopt ./

$ ./configure --prefix=/home/digoal/gp --enable-orca --with-perl \
--with-python --with-libxml \
--with-includes=/home/digoal/gporca_home/include:/home/digoal/gpos_home/include:/home/digoal/gp-xerces_home/include \
--with-libraries=/home/digoal/gporca_home/lib:/home/digoal/gpos_home/lib:/home/digoal/gp-xerces_home/lib 
$ make -j 32
$ make install
$ ln -s /home/digoal/gp /home/digoal/greenplum-db

$ vi ~/env_gp.sh
GPHOME=/home/digoal/greenplum-db

# Replace with symlink path if it is present and correct
if [ -h ${GPHOME}/../greenplum-db ]; then
    GPHOME_BY_SYMLINK=`(cd ${GPHOME}/../greenplum-db/ && pwd -P)`
    if [ x"${GPHOME_BY_SYMLINK}" = x"${GPHOME}" ]; then
        GPHOME=`(cd ${GPHOME}/../greenplum-db/ && pwd -L)`/.
    fi
    unset GPHOME_BY_SYMLINK
fi

PATH=$GPHOME/bin:$GPHOME/ext/python/bin:$PATH
LD_LIBRARY_PATH=$GPHOME/lib:$GPHOME/ext/python/lib:$LD_LIBRARY_PATH
PYTHONPATH=$GPHOME/lib/python
#PYTHONHOME=$GPHOME/ext/python
OPENSSL_CONF=$GPHOME/etc/openssl.cnf

export GPHOME
export PATH
export LD_LIBRARY_PATH
export PYTHONPATH
#export PYTHONHOME
export OPENSSL_CONF
export MASTER_DATA_DIRECTORY=/data01/digoal/gpdata/gpseg-1
export PGHOST=127.0.0.1
export PGPORT=1921
export PGUSER=digoal
export PGDATABASE=postgres

export LD_LIBRARY_PATH=/home/digoal/gpos_home/lib:/home/digoal/gp-xerces_home/lib:/home/digoal/gporca_home/lib:$LD_LIBRARY_PATH

# vi /etc/ld.so.conf
include ld.so.conf.d/*.conf
/home/digoal/gpos_home/lib
/home/digoal/gp-xerces_home/lib
/home/digoal/gporca_home/lib
/home/digoal/greenplum-db/lib

# ldconfig

你也可以把orca安装到与greenplum一致的目录中,那SO就不需要设置到/etc/ld.so.conf了。

测试ORCA

重启数据库

$ gpstop -M fast -a
$ gpstart -a

开启ORCA测试

$ psql

set client_min_messages='log';
set optimizer=on;
set optimizer_enumerate_plans=on;
set optimizer_minidump=always;
set optimizer_enable_constant_expression_evaluation=off;

create table test(id int, info text, crt_time timestamp);
insert into test select generate_series(1,10000000), md5(random()::text),clock_timestamp();
\timing
insert into test select generate_series(1,10000000), md5(random()::text),clock_timestamp();
create index idx1 on test(info,crt_time);
create index idx2 on test(crt_time,info);

postgres=# explain analyze select * from test where info='abc' and crt_time between '2016-07-16 13:41:06.555882' and '2016-07-16 13:41:06.555882' order by crt_time limit 10;
LOG:  statement: explain analyze select * from test where info='abc' and crt_time between '2016-07-16 13:41:06.555882' and '2016-07-16 13:41:06.555882' order by crt_time limit 10;
LOG:  2016-07-16 17:33:17:175315 CST,THD000,TRACE,"[OPT]: Number of plan alternatives: 8
",
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.38 rows=1 width=45)
   Rows out:  0 rows with 1.141 ms to end, start offset by 0.409 ms.
   ->  Gather Motion 16:1  (slice1; segments: 16)  (cost=0.00..0.38 rows=1 width=45)
         Merge Key: crt_time
         Rows out:  0 rows at destination with 1.139 ms to end, start offset by 0.410 ms.
         ->  Sort  (cost=0.00..0.38 rows=1 width=45)
               Sort Key: crt_time
               Rows out:  0 rows (seg0) with 0.065 ms to end, start offset by 1.099 ms.
               Executor memory:  33K bytes avg, 33K bytes max (seg0).
               Work_mem used:  33K bytes avg, 33K bytes max (seg0). Workfile: (0 spilling)
               ->  Index Scan using idx1 on test  (cost=0.00..0.38 rows=1 width=45)
                     Index Cond: info = 'abc'::text
                     Filter: crt_time >= '2016-07-16 13:41:06.555882'::timestamp without time zone AND crt_time <= '2016-07-16 13:41:06.555882'::timestamp without time zone
                     Rows out:  0 rows (seg0) with 0.045 ms to end, start offset by 1.116 ms.
 Slice statistics:
   (slice0)    Executor memory: 318K bytes.
   (slice1)    Executor memory: 209K bytes avg x 16 workers, 209K bytes max (seg0).  Work_mem: 33K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  optimizer=on
 Optimizer status: PQO version 1.646
 Total runtime: 5.807 ms
(22 rows)

postgres=# set enable_sort=off;
LOG:  statement: set enable_sort=off;
SET
postgres=# explain analyze select * from test where info='abc' and crt_time between '2016-07-16 13:41:06.555882' and '2016-07-16 13:41:06.555882' order by crt_time limit 10;
LOG:  statement: explain analyze select * from test where info='abc' and crt_time between '2016-07-16 13:41:06.555882' and '2016-07-16 13:41:06.555882' order by crt_time limit 10;
LOG:  2016-07-16 17:33:23:623812 CST,THD000,TRACE,"[OPT]: Number of plan alternatives: 8
",
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.38 rows=1 width=45)
   Rows out:  0 rows with 1.019 ms to end, start offset by 0.420 ms.
   ->  Gather Motion 16:1  (slice1; segments: 16)  (cost=0.00..0.38 rows=1 width=45)
         Merge Key: crt_time
         Rows out:  0 rows at destination with 1.017 ms to end, start offset by 0.421 ms.
         ->  Sort  (cost=0.00..0.38 rows=1 width=45)
               Sort Key: crt_time
               Rows out:  0 rows (seg0) with 0.081 ms to end, start offset by 1.047 ms.
               Executor memory:  33K bytes avg, 33K bytes max (seg0).
               Work_mem used:  33K bytes avg, 33K bytes max (seg0). Workfile: (0 spilling)
               ->  Index Scan using idx1 on test  (cost=0.00..0.38 rows=1 width=45)
                     Index Cond: info = 'abc'::text
                     Filter: crt_time >= '2016-07-16 13:41:06.555882'::timestamp without time zone AND crt_time <= '2016-07-16 13:41:06.555882'::timestamp without time zone
                     Rows out:  0 rows (seg0) with 0.060 ms to end, start offset by 1.064 ms.
 Slice statistics:
   (slice0)    Executor memory: 318K bytes.
   (slice1)    Executor memory: 209K bytes avg x 16 workers, 209K bytes max (seg0).  Work_mem: 33K bytes max.
 Statement statistics:
   Memory used: 128000K bytes
 Settings:  enable_sort=off; optimizer=on
 Optimizer status: PQO version 1.646
 Total runtime: 7.194 ms
(22 rows)

ORCA已经整合到阿里云推出的ApsaraDB Greenplum产品中,欢迎使用。
祝大家玩得开心,欢迎随时来阿里云 促膝长谈 业务需求 ,恭候光临。
阿里云的小伙伴们加油,努力做 最贴地气 的云数据库 。

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
一文带你了解MySQL基于规则的优化
我们无法避免某些小伙伴写一些执行起来十分耗费性能的语句。即使是这样,MySQL的还是依据一些规则,竭尽全力的把这个很糟糕的语句转换成某种可以比较高效执行的形式,本章主要就是详细讲解下这些比较重要的重写规则。
51 0
PostgreSQL 性能优化: 执行计划
PostgreSQL为每个收到的查询产生一个查询计划。查询计划的结构是一个计划结点的树。最底层的结点是扫描结点:它们从表中返回未经处理的行。不同的表访问模式有不同的扫描结点类型:顺序扫描、索引扫描、位图索引扫描。也还有不是表的行来源,例如VALUES子句和FROM中返回集合的函数,它们有自己的结点类型。如果查询需要连接、聚集、排序、或者在未经处理的行上的其它操作,那么就会在扫描结点之上有其它额外的结点来执行这些操作。并且,做这些操作通常都有多种方法,因此在这些位置也有可能出现不同的结点类型。
22 0
MySQL 基于规则的优化
MySQL 基于规则的优化
139 0
PostgreSQL PostGIS 性能提升 - by new GEOS代码
标签 PostgreSQL , PostGIS , geos 背景 http://lin-ear-th-inking.blogspot.com/2019/02/betterfaster-stpointonsurface-for.html 使用GEOS新的代码,提升PostGIS重计算的函数性能。 The improved ST_PointOnSurface runs 13 times
604 0
PostgreSQL citus, Greenplum 分布式执行计划 DEBUG
标签 PostgreSQL , citus , sharding , Greenplum , explain , debug 背景 开启DEBUG,可以观察citus, Greenplum的SQL分布式执行计划,下发情况,主节点,数据节点交互情况。
1459 0
[玩转MySQL之六]MySQL查询优化器
MySQL查询优化器的主要功能是完成SELECT语句执行,在保证SELECT语句正确执行之外,还有一个重要的功能,是使用关系代数、启发式规则、代价估值模型等不同种类的技术,提高语句的执行效率。本文将从整体上介绍MySQL查询优化器及其细节。
3329 0
MongoDB 执行计划 & 优化器简介 (上)
最近,由于工作需求去了解一下Query是如何在MongoDB内部进行处理,从而丢给存储引擎的。里面涉及了Query执行计划和优化器的相关代码,MongoDB整体思路设计的干净利落,有些地方深入挖一下其实还是能有些优化点的。本文会涉及一条Query被parse之后一路走到引擎之前,都做了那些事情,分析基于MongoDB v3.4.6代码。由于篇幅过长,文章分为上下两篇,分别介绍执行计划 & 优化器和
3350 0
数据库领域前沿技术分享与交流
+关注
德哥
公益是一辈子的事, I am digoal, just do it.
文章
问答
视频
相关电子书
更多
为什么我们需要GREENPLUM
立即下载
低代码开发师(初级)实战教程
立即下载
阿里巴巴DevOps 最佳实践手册
立即下载