Greenplum TPC-H测试-阿里云开发者社区

开发者社区> 数据库> 正文
登录阅读全文

Greenplum TPC-H测试

简介: 请参考如下 http://www.tpc.org/information/current_specifications.asp https://github.com/digoal/pg_tpch 下载tpch TPC-H V2.17.1 pdf Download D

请参考如下
http://www.tpc.org/information/current_specifications.asp
https://github.com/digoal/pg_tpch

下载tpch
git clone https://github.com/digoal/pg_tpch.git

安装tpch

$ cd gp_tpch

需要在Makefile中设置几个参数:

################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS, 
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32 
# Current values for WORKLOAD are:  TPCH
DATABASE=ORACLE
MACHINE = LINUX
WORKLOAD = TPCH

安装

$make

输出dbgen用法:

$./dbgen -h
TPC-H Population Generator (Version 2.17.0 build 0)
Copyright Transaction Processing Performance Council 1994 - 2010
USAGE:
dbgen [-{vf}][-T {pcsoPSOL}]
        [-s <scale>][-C <procs>][-S <step>]
dbgen [-v] [-O m] [-s <scale>] [-U <updates>]

Basic Options
===========================
-C <n> -- separate data set into <n> chunks (requires -S, default: 1)
-f     -- force. Overwrite existing files
-h     -- display this message
-q     -- enable QUIET mode
-s <n> -- set Scale Factor (SF) to  <n> (default: 1) 
-S <n> -- build the <n>th step of the data/update set (used with -C or -U)
-U <n> -- generate <n> update sets
-v     -- enable VERBOSE mode

Advanced Options
===========================
-b <s> -- load distributions for <s> (default: dists.dss)
-d <n> -- split deletes between <n> files (requires -U)
-i <n> -- split inserts between <n> files (requires -U)
-T c   -- generate cutomers ONLY
-T l   -- generate nation/region ONLY
-T L   -- generate lineitem ONLY
-T n   -- generate nation ONLY
-T o   -- generate orders/lineitem ONLY
-T O   -- generate orders ONLY
-T p   -- generate parts/partsupp ONLY
-T P   -- generate parts ONLY
-T r   -- generate region ONLY
-T s   -- generate suppliers ONLY
-T S   -- generate partsupp ONLY

To generate the SF=1 (1GB), validation database population, use:
        dbgen -vf -s 1

To generate updates for a SF=1 (1GB), use:
        dbgen -v -U 1 -s 1

使用dbgen产生一些测试数据, -s 表示scale(单位为GB),根据需要测试的数据量指定:

$./dbgen -s 100 -f
$ll -h *.tbl
-rw-r--r-- 1 digoal users  12G Nov 19 13:35 customer.tbl
-rw-r--r-- 1 digoal users 6.6G Nov 19 13:39 lineitem.tbl
-rw-r--r-- 1 digoal users 2.2K Nov 19 12:48 nation.tbl
-rw-r--r-- 1 digoal users 1.5G Nov 19 13:39 orders.tbl
-rw-r--r-- 1 digoal users 1.2G Nov 19 12:48 partsupp.tbl
-rw-r--r-- 1 digoal users 233M Nov 19 12:48 part.tbl
-rw-r--r-- 1 digoal users  389 Nov 19 12:48 region.tbl
-rw-r--r-- 1 digoal users 686M Nov 19 13:30 supplier.tbl

生成的测试数据格式如下:

$head -n 10 customer.tbl
1|Customer#000000001|IVhzIApeRb ot,c,E|15|25-989-741-2988|711.56|BUILDING|to the even, regular platelets. regular, ironic epitaphs nag e|
2|Customer#000000002|XSTf4,NCwDVaWNe6tEgvwfmRchLXak|13|23-768-687-3665|121.65|AUTOMOBILE|l accounts. blithely ironic theodolites integrate boldly: caref|
3|Customer#000000003|MG9kdTD2WBHm|1|11-719-748-3364|7498.12|AUTOMOBILE| deposits eat slyly ironic, even instructions. express foxes detect slyly. blithely even accounts abov|
4|Customer#000000004|XxVSJsLAGtn|4|14-128-190-5944|2866.83|MACHINERY| requests. final, regular ideas sleep final accou|
5|Customer#000000005|KvpyuHCplrB84WgAiGV6sYpZq7Tj|3|13-750-942-6364|794.47|HOUSEHOLD|n accounts will have to unwind. foxes cajole accor|
6|Customer#000000006|sKZz0CsnMD7mp4Xd0YrBvx,LREYKUWAh yVn|20|30-114-968-4951|7638.57|AUTOMOBILE|tions. even deposits boost according to the slyly bold packages. final accounts cajole requests. furious|
7|Customer#000000007|TcGe5gaZNgVePxU5kRrvXBfkasDTea|18|28-190-982-9759|9561.95|AUTOMOBILE|ainst the ironic, express theodolites. express, even pinto beans among the exp|
8|Customer#000000008|I0B10bB0AymmC, 0PrRYBCP1yGJ8xcBPmWhl5|17|27-147-574-9335|6819.74|BUILDING|among the slyly regular theodolites kindle blithely courts. carefully even theodolites haggle slyly along the ide|
9|Customer#000000009|xKiAFTjUsCuxfeleNqefumTrjS|8|18-338-906-3675|8324.07|FURNITURE|r theodolites according to the requests wake thinly excuses: pending requests haggle furiousl|
10|Customer#000000010|6LrEaV6KR6PLVcgl2ArL Q3rqzLzcT1 v2|5|15-741-346-9870|2753.54|HOUSEHOLD|es regular deposits haggle. fur|

将测试数据转换为postgresql识别的格式,删除末尾的分隔符|。

$for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; done

$ll -rth *.csv
-rw-r--r-- 1 digoal users  12G Dec  1 13:32 customer.csv
-rw-r--r-- 1 digoal users 6.5G Dec  1 13:33 lineitem.csv
-rw-r--r-- 1 digoal users 2.2K Dec  1 13:33 nation.csv
-rw-r--r-- 1 digoal users 1.5G Dec  1 13:34 orders.csv
-rw-r--r-- 1 digoal users 1.2G Dec  1 13:34 partsupp.csv
-rw-r--r-- 1 digoal users 231M Dec  1 13:34 part.csv
-rw-r--r-- 1 digoal users  384 Dec  1 13:34 region.csv
-rw-r--r-- 1 digoal users 681M Dec  1 13:34 supplier.csv

把包含csv文件的目录,软链接到/tmp/dss-data。

$ ln -s `pwd` /tmp/dss-data

生成测试SQL , SF= dbgen指定的scale值

SF=?
mkdir dss/queries
for q in `seq 1 22`
do
    DSS_QUERY=dss/templates ./qgen -s $SF $q > dss/queries/$q.sql
    sed 's/^select/explain select/' dss/queries/$q.sql > dss/queries/$q.explain.sql
done

在greenplum数据库中创建数据库和用户(也可以不创建,只要有就可以)

psql
postgres=# create role digoal login;
CREATE ROLE
postgres=# create database postgres template template0 owner digoal;
CREATE DATABASE

配置pg_hba.conf

$ vi $MASTER_DATA_DIRECTORY/pg_hba.conf
host all all 127.0.0.1/32 trust
$ gpstop -u

设置几个参数:

gpconfig -c enable_nestloop -v off
gpconfig -c work_mem -v 256MB
gpstop -u

测试,使用digoal用户连接到postgres数据库,结果输出到./results目录:
自动创建表,加载数据。详见tpch.sh脚本

$ ./tpch.sh ./results ip port tpch-db tpch-user password {row|column}

如果要测试redshift可以这样

Redshift (copy by ssh):
// add manifest file to s3 first
// manifest file must in $S3/${table}.manifest
$ ./tpch.sh ./results ip port tpch-db tpch-user password redshift S3 EC2_ID EC2_KEY

可以在results中查看一些信息,例如每个SQL的explain,时间。
结束后,可以使用以下方法生成CSV报告。

$ php process.php ./results output.csv

测试结果

2015-12-08 10:40:23 [1449542423] : running TPC-H benchmark
2015-12-08 10:40:23 [1449542423] : running queries defined in TPC-H benchmark
2015-12-08 10:40:23 [1449542423] :   running query 1
2015-12-08 10:40:43 [1449542443] :     query 1 finished OK (10 seconds)
2015-12-08 10:40:43 [1449542443] :   running query 2
2015-12-08 10:40:47 [1449542447] :     query 2 finished OK (2 seconds)
2015-12-08 10:40:47 [1449542447] :   running query 3
2015-12-08 10:40:54 [1449542454] :     query 3 finished OK (4 seconds)
2015-12-08 10:40:54 [1449542454] :   running query 4
2015-12-08 10:40:56 [1449542456] :     query 4 finished OK (1 seconds)
2015-12-08 10:40:56 [1449542456] :   running query 5
2015-12-08 10:40:58 [1449542458] :     query 5 finished OK (1 seconds)
2015-12-08 10:40:58 [1449542458] :   running query 6
2015-12-08 10:40:59 [1449542459] :     query 6 finished OK (1 seconds)
2015-12-08 10:40:59 [1449542459] :   running query 7
2015-12-08 10:41:10 [1449542470] :     query 7 finished OK (6 seconds)
2015-12-08 10:41:10 [1449542470] :   running query 8
2015-12-08 10:41:18 [1449542478] :     query 8 finished OK (4 seconds)
2015-12-08 10:41:18 [1449542478] :   running query 9
2015-12-08 10:41:54 [1449542514] :     query 9 finished OK (18 seconds)
2015-12-08 10:41:54 [1449542514] :   running query 10
2015-12-08 10:41:55 [1449542515] :     query 10 finished OK (1 seconds)
2015-12-08 10:41:55 [1449542515] :   running query 11
2015-12-08 10:41:57 [1449542517] :     query 11 finished OK (1 seconds)
2015-12-08 10:41:57 [1449542517] :   running query 12
2015-12-08 10:41:59 [1449542519] :     query 12 finished OK (1 seconds)
2015-12-08 10:41:59 [1449542519] :   running query 13
2015-12-08 10:42:04 [1449542524] :     query 13 finished OK (3 seconds)
2015-12-08 10:42:04 [1449542524] :   running query 14
2015-12-08 10:42:05 [1449542525] :     query 14 finished OK (1 seconds)
2015-12-08 10:42:05 [1449542525] :   running query 15
2015-12-08 10:42:07 [1449542527] :     query 15 finished OK (1 seconds)
2015-12-08 10:42:07 [1449542527] :   running query 16
2015-12-08 10:42:10 [1449542530] :     query 16 finished OK (2 seconds)
2015-12-08 10:42:10 [1449542530] :   running query 17
2015-12-08 10:42:26 [1449542546] :     query 17 finished OK (8 seconds)
2015-12-08 10:42:26 [1449542546] :   running query 18
2015-12-08 10:42:46 [1449542566] :     query 18 finished OK (10 seconds)
2015-12-08 10:42:46 [1449542566] :   running query 19
2015-12-08 10:42:49 [1449542569] :     query 19 finished OK (2 seconds)
2015-12-08 10:42:49 [1449542569] :   running query 20
2015-12-08 10:42:51 [1449542571] :     query 20 finished OK (1 seconds)
2015-12-08 10:42:51 [1449542571] :   running query 21
2015-12-08 10:43:06 [1449542586] :     query 21 finished OK (8 seconds)
2015-12-08 10:43:06 [1449542586] :   running query 22
2015-12-08 10:43:10 [1449542590] :     query 22 finished OK (2 seconds)
2015-12-08 10:43:10 [1449542590] : finished TPC-H benchmark

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

分享:
数据库
使用钉钉扫一扫加入圈子
+ 订阅

分享数据库前沿,解构实战干货,推动数据库技术变革

其他文章
最新文章
相关文章