正文
这是Star Schema Benchmark 标准测试集优化的第三篇,前一篇我们分析了下表数据,这一篇是最后一篇了。
一、分析算法路径
更新到前一篇的时候,其实专利技术已经集成到数据库中了,这个算法路径,主要是验证一下:在测试环境中的算法路径,是否和开发环境中一致。实际结果如下,13 条SQL语句的算法路径和开发环境中的算法路径,经过验证是完全一致的。
2022-10-20 01:39:53.344 - SQL2: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25; ; Status: PASS, Elapsed: 3.923, Affected: 1 Info: Job[67:RESTRICT, tasks:193, time:3,591, size: 3,576,178, [0|193|3,591] ] Job[69:DIMENSION_JOIN, tasks:193, time:16, size: 3,576,178, [0|193|15] ] Job[71:OUTPUT_HASH_GROUP_BY, tasks:6, time:166, size: 6, [0|6|167] ] Job[73:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:3, size: 1, [0|1|3] ] Job[75:PROJECT, tasks:1, time:2, size: 1, [0|1|1] ] total time: 3,788, record count: 1 restrict的时间/整体时间: 3591/3788 result: REVENUE | - - - - - - - 6568512155417 | 2022-10-20 01:39:59.707 - SQL4: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35; ; Status: PASS, Elapsed: 1.767, Affected: 1 Info: Job[99:RESTRICT, tasks:193, time:1,727, size: 126,610, [0|193|1,728] ] Job[101:DIMENSION_JOIN, tasks:193, time:8, size: 126,610, [0|193|8] ] Job[103:OUTPUT_HASH_GROUP_BY, tasks:2, time:22, size: 2, [0|2|22] ] Job[105:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:1, size: 1, [0|1|1] ] Job[107:PROJECT, tasks:1, time:1, size: 1, [0|1|1] ] total time: 1,764, record count: 1 restrict的时间/整体时间: 1727/1764 result: REVENUE | - - - - - - - 550150245374 | 2022-10-20 01:40:04.552 - SQL6: select sum(lo_revenue) as revenue from lineorder,dates where lo_orderdate = d_datekey and d_weeknuminyear = 6 and d_year = 1994 and lo_discount between 5 and 7 and lo_quantity between 26 and 35; ; Status: PASS, Elapsed: 2.138, Affected: 1 Info: Job[131:RESTRICT, tasks:193, time:2,108, size: 28,441, [0|193|2,109] ] Job[133:DIMENSION_JOIN, tasks:193, time:7, size: 28,441, [0|193|7] ] Job[135:OUTPUT_HASH_GROUP_BY, tasks:1, time:15, size: 1, [0|1|15] ] Job[137:MERGE_HASH_GROUPBY_PARTITION, tasks:1, time:1, size: 1, [0|1|1] ] Job[139:PROJECT, tasks:1, time:1, size: 1, [0|1|1] ] total time: 2,136, record count: 1 restrict的时间/整体时间: 2108/2136 result: REVENUE | - - - - - - - 122223605792 | 2022-10-20 01:40:31.923 - SQL8: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder ,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand; ; Status: PASS, Elapsed: 4.714, Affected: 280 Info: Job[193:RESTRICT, tasks:193, time:4,342, size: 1,442,564, [0|193|4,342] ] Job[195:DIMENSION_JOIN, tasks:193, time:9, size: 1,442,564, [0|193|10] ] Job[197:DIMENSION_JOIN, tasks:193, time:5, size: 1,442,564, [0|193|5] ] Job[199:DIMENSION_JOIN, tasks:193, time:5, size: 1,442,564, [0|193|5] ] Job[202:OUTPUT_HASH_GROUP_BY, tasks:6, time:332, size: 1,680, [0|6|332] ] Job[204:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:4, size: 280, [0|7|4] ] Job[206:PROJECT, tasks:7, time:3, size: 280, [0|7|3] ] Job[208:RANGE_SORT, tasks:7, time:2, size: 280, [0|7|2] ] Job[212:PARTITION_TABLET, tasks:7, time:0, size: 280, [0|7|0] ] Job[214:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 70, [0|1|1] ] total time: 4,711, record count: 280 restrict的时间/整体时间: 4342/4711 result: LO_REVENUE |D_YEAR |P_BRAND | - - - - - - - - - - - - - - - - - - 18712903257 |1992 |MFGR#121 | 20576919851 |1992 |MFGR#1210 | 20452654696 |1992 |MFGR#1211 | 2022-10-20 01:40:37.664 - SQL10: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA' group by d_year, p_brand order by d_year, p_brand; ; Status: PASS, Elapsed: 1.24, Affected: 56 Info: Job[292:RESTRICT, tasks:193, time:1,017, size: 287,885, [0|193|1,017] ] Job[294:DIMENSION_JOIN, tasks:193, time:5, size: 287,885, [0|193|5] ] Job[296:DIMENSION_JOIN, tasks:193, time:5, size: 287,885, [0|193|5] ] Job[298:DIMENSION_JOIN, tasks:193, time:4, size: 287,885, [0|193|4] ] Job[301:OUTPUT_HASH_GROUP_BY, tasks:3, time:193, size: 168, [0|3|193] ] Job[303:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 56, [0|7|2] ] Job[305:PROJECT, tasks:7, time:1, size: 56, [0|7|1] ] Job[307:RANGE_SORT, tasks:7, time:1, size: 56, [0|7|1] ] Job[311:PARTITION_TABLET, tasks:7, time:0, size: 56, [0|7|0] ] Job[313:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 12, [0|1|0] ] total time: 1,235, record count: 56 restrict的时间/整体时间: 1017/1235 result: LO_REVENUE |D_YEAR |P_BRAND | - - - - - - - - - - - - - - - - - - 19803695538 |1992 |MFGR#2221 | 19639734537 |1992 |MFGR#2222 | 19945070508 |1992 |MFGR#2223 | 2022-10-20 01:40:39.888 - SQL12: select sum(lo_revenue) as lo_revenue, d_year, p_brand from lineorder,dates,part,supplier where lo_orderdate = d_datekey and lo_partkey = p_partkey and lo_suppkey = s_suppkey and p_brand = 'MFGR#2239' and s_region = 'EUROPE' group by d_year, p_brand order by d_year, p_brand; ; Status: PASS, Elapsed: 0.862, Affected: 7 Info: Job[397:RESTRICT, tasks:193, time:785, size: 35,599, [0|193|785] ] Job[399:DIMENSION_JOIN, tasks:193, time:4, size: 35,599, [0|193|4] ] Job[401:DIMENSION_JOIN, tasks:193, time:3, size: 35,599, [0|193|3] ] Job[403:DIMENSION_JOIN, tasks:193, time:3, size: 35,599, [0|193|3] ] Job[406:OUTPUT_HASH_GROUP_BY, tasks:1, time:49, size: 7, [0|1|49] ] Job[408:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:8, size: 7, [0|7|8] ] Job[410:PROJECT, tasks:7, time:0, size: 7, [0|7|0] ] Job[412:RANGE_SORT, tasks:5, time:1, size: 7, [0|5|1] ] Job[416:PARTITION_TABLET, tasks:5, time:0, size: 7, [0|5|0] ] Job[418:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 2, [0|1|0] ] total time: 859, record count: 7 restrict的时间/整体时间: 785/859 result: LO_REVENUE |D_YEAR |P_BRAND | - - - - - - - - - - - - - - - - - - 19700225276 |1992 |MFGR#2239 | 19306484466 |1993 |MFGR#2239 | 19398411013 |1994 |MFGR#2239 | 2022-10-20 01:41:02.935 - SQL14: select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, lo_revenue desc; ; Status: PASS, Elapsed: 5.015, Affected: 150 Info: Job[502:RESTRICT, tasks:193, time:3,856, size: 6,570,093, [0|193|3,857] ] Job[504:DIMENSION_JOIN, tasks:193, time:5, size: 6,570,093, [0|193|4] ] Job[506:DIMENSION_JOIN, tasks:193, time:3, size: 6,570,093, [0|193|3] ] Job[508:DIMENSION_JOIN, tasks:193, time:3, size: 6,570,093, [0|193|3] ] Job[511:OUTPUT_HASH_GROUP_BY, tasks:6, time:1,132, size: 900, [0|6|1,132] ] Job[513:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:3, size: 150, [0|7|3] ] Job[515:PROJECT, tasks:7, time:1, size: 150, [0|7|1] ] Job[517:RANGE_SORT, tasks:7, time:1, size: 150, [0|7|1] ] Job[521:PARTITION_TABLET, tasks:7, time:0, size: 150, [0|7|0] ] Job[523:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 22, [0|1|1] ] total time: 5,012, record count: 150 restrict的时间/整体时间: 3856/5012 result: C_NATION |S_NATION |D_YEAR |LO_REVENUE | - - - - - - - - - - - - - - - - - - - - - - - - - JAPAN |INDIA |1992 |163691240866 | CHINA |INDONESIA |1992 |163434081261 | CHINA |INDIA |1992 |163430796231 | 2022-10-20 01:41:11.469 - SQL16: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES' and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ; Status: PASS, Elapsed: 3.478, Affected: 600 Info: Job[589:RESTRICT, tasks:193, time:3,262, size: 264,531, [0|193|3,262] ] Job[591:DIMENSION_JOIN, tasks:193, time:4, size: 264,531, [0|193|4] ] Job[593:DIMENSION_JOIN, tasks:193, time:4, size: 264,531, [0|193|4] ] Job[595:DIMENSION_JOIN, tasks:193, time:3, size: 264,531, [0|193|3] ] Job[598:OUTPUT_HASH_GROUP_BY, tasks:3, time:189, size: 1,800, [0|3|189] ] Job[600:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:5, size: 600, [0|7|5] ] Job[602:PROJECT, tasks:7, time:1, size: 600, [0|7|1] ] Job[604:RANGE_SORT, tasks:7, time:1, size: 600, [0|7|1] ] Job[608:PARTITION_TABLET, tasks:7, time:0, size: 600, [0|7|0] ] Job[610:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 100, [0|1|0] ] total time: 3,475, record count: 600 restrict的时间/整体时间: 3262/3475 result: C_CITY |S_CITY |D_YEAR |LO_REVENUE | - - - - - - - - - - - - - - - - - - - - - - - - UNITED ST3 |UNITED ST4 |1992 |1915435842 | UNITED ST8 |UNITED ST0 |1992 |1910327375 | UNITED ST5 |UNITED ST0 |1992 |1893024189 | 2022-10-20 01:41:16.505 - SQL18: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_year >= 1992 and d_year <= 1997 group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ; Status: PASS, Elapsed: 1.588, Affected: 24 Info: Job[679:RESTRICT, tasks:193, time:1,525, size: 10,616, [0|193|1,525] ] Job[681:DIMENSION_JOIN, tasks:193, time:6, size: 10,616, [0|193|6] ] Job[683:DIMENSION_JOIN, tasks:193, time:4, size: 10,616, [0|193|4] ] Job[685:DIMENSION_JOIN, tasks:193, time:3, size: 10,616, [0|193|3] ] Job[688:OUTPUT_HASH_GROUP_BY, tasks:1, time:35, size: 24, [0|1|35] ] Job[690:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 24, [0|7|2] ] Job[692:PROJECT, tasks:7, time:0, size: 24, [0|7|0] ] Job[694:RANGE_SORT, tasks:7, time:1, size: 24, [0|7|1] ] Job[698:PARTITION_TABLET, tasks:7, time:0, size: 24, [0|7|0] ] Job[700:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 3, [0|1|1] ] total time: 1,585, record count: 24 restrict的时间/整体时间: 1525/1585 result: C_CITY |S_CITY |D_YEAR |LO_REVENUE | - - - - - - - - - - - - - - - - - - - - - - - - UNITED KI1 |UNITED KI1 |1992 |1786080690 | UNITED KI5 |UNITED KI1 |1992 |1705128984 | UNITED KI1 |UNITED KI5 |1992 |1620054330 | 2022-10-20 01:41:19.166 - SQL20: select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue from lineorder,dates,customer,supplier where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth = 'Dec1997' group by c_city, s_city, d_year order by d_year asc, lo_revenue desc; ; Status: PASS, Elapsed: 1.046, Affected: 4 Info: Job[775:RESTRICT, tasks:193, time:1,012, size: 151, [0|193|1,012] ] Job[777:DIMENSION_JOIN, tasks:109, time:4, size: 151, [0|109|4] ] Job[779:DIMENSION_JOIN, tasks:109, time:4, size: 151, [0|109|4] ] Job[781:DIMENSION_JOIN, tasks:109, time:3, size: 151, [0|109|3] ] Job[784:OUTPUT_HASH_GROUP_BY, tasks:1, time:13, size: 4, [0|1|13] ] Job[786:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:1, size: 4, [0|7|1] ] Job[788:PROJECT, tasks:7, time:1, size: 4, [0|7|1] ] Job[790:RANGE_SORT, tasks:2, time:0, size: 4, [0|2|0] ] Job[794:PARTITION_TABLET, tasks:2, time:0, size: 4, [0|2|0] ] Job[796:MERGE_ORDERBY_RANGES, tasks:1, time:1, size: 2, [0|1|1] ] total time: 1,044, record count: 4 restrict的时间/整体时间: 1012/1044 result: C_CITY |S_CITY |D_YEAR |LO_REVENUE | - - - - - - - - - - - - - - - - - - - - - - - - UNITED KI5 |UNITED KI1 |1997 |168840628 | UNITED KI1 |UNITED KI5 |1997 |140264663 | UNITED KI1 |UNITED KI1 |1997 |135684305 | 2022-10-20 01:41:29.956 - SQL22: select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation; ; Status: PASS, Elapsed: 2.833, Affected: 35 Info: Job[878:RESTRICT, tasks:193, time:2,137, size: 2,882,137, [0|193|2,138] ] Job[880:DIMENSION_JOIN, tasks:193, time:8, size: 2,882,137, [0|193|8] ] Job[882:DIMENSION_JOIN, tasks:193, time:3, size: 2,882,137, [0|193|3] ] Job[884:DIMENSION_JOIN, tasks:193, time:2, size: 2,882,137, [0|193|2] ] Job[886:DIMENSION_JOIN, tasks:193, time:3, size: 2,882,137, [0|193|2] ] Job[889:OUTPUT_HASH_GROUP_BY, tasks:6, time:663, size: 210, [0|6|662] ] Job[891:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:1, size: 35, [0|7|1] ] Job[893:PROJECT, tasks:7, time:1, size: 35, [0|7|1] ] Job[895:PROJECT, tasks:7, time:0, size: 35, [0|7|0] ] Job[897:RANGE_SORT, tasks:7, time:1, size: 35, [0|7|1] ] Job[901:PARTITION_TABLET, tasks:7, time:0, size: 35, [0|7|0] ] Job[903:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 1, [0|1|0] ] total time: 2,830, record count: 35 restrict的时间/整体时间: 2137/2830 result: D_YEAR |C_NATION |PROFIT | - - - - - - - - - - - - - - - - - - - - 1992 |ARGENTINA |312585625436 | 1992 |BRAZIL |312719709853 | 1992 |CANADA |307040911677 | 2022-10-20 01:41:37.244 - SQL24: select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit from lineorder,dates,customer,supplier,part where lo_orderdate = d_datekey and lo_custkey = c_custkey and lo_suppkey = s_suppkey and lo_partkey = p_partkey and c_region = 'AMERICA'and s_region = 'AMERICA' and (d_year = 1997 or d_year = 1998) and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, s_nation, p_category order by d_year, s_nation, p_category; ; Status: PASS, Elapsed: 4.462, Affected: 100 Info: Job[1005:RESTRICT, tasks:193, time:4,155, size: 694,402, [0|193|4,155] ] Job[1007:DIMENSION_JOIN, tasks:193, time:4, size: 694,402, [0|193|4] ] Job[1009:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|3] ] Job[1011:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|3] ] Job[1013:DIMENSION_JOIN, tasks:193, time:3, size: 694,402, [0|193|2] ] Job[1016:OUTPUT_HASH_GROUP_BY, tasks:6, time:272, size: 600, [0|6|271] ] Job[1018:MERGE_HASH_GROUPBY_PARTITION, tasks:7, time:2, size: 100, [0|7|2] ] Job[1020:PROJECT, tasks:7, time:0, size: 100, [0|7|0] ] Job[1022:PROJECT, tasks:7, time:1, size: 100, [0|7|0] ] Job[1024:RANGE_SORT, tasks:7, time:2, size: 100, [0|7|1] ] Job[1028:PARTITION_TABLET, tasks:7, time:0, size: 100, [0|7|0] ] Job[1030:MERGE_ORDERBY_RANGES, tasks:1, time:0, size: 10, [0|1|0] ] total time: 4,460, record count: 100 restrict的时间/整体时间: 4155/4460 result: D_YEAR |S_NATION |P_CATEGORY |PROFIT | - - - - - - - - - - - - - - - - - - - - - - - - - - 1997 |ARGENTINA |MFGR#11 |31668757023 | 1997 |ARGENTINA |MFGR#12 |31315629143 | 1997 |ARGENTINA |MFGR#13 |31899989093 |
我这边主要做了一下内容:
- 将数据库执行结果通过Linux脚本下载下来
cat ../conf/ssb_test.sql | ./cplus.sh > ssb30_record_result.txt
- 将数据库日志中的 job 信息通过Linux脚本下载下来
./cplus.sh <<EOF desc history; EOF
- 编写python脚本提取 RESTRICT 时间,并组合前两步骤的结果
a = """ {具体的 job 信息} """ b = """ {具体的 查询结果 信息} """ if __name__ == '__main__': result_a = [] # print(a.split("\n\n")) for item in a.split("\n\n"): result_item = '\nrestrict的时间/整体时间: ' flag = False for item_item in item.split("\n"): if "RESTRICT" in item_item: flag = True result_item += item_item.split("time:")[1].split(", ")[0] + "/" elif "total time" in item_item: result_item += item_item.split("total time: ")[1].split(", ")[0] + "\n" if not flag: continue result_item = result_item.replace(",", "") result_a.append(item + result_item) result_a.reverse() # for item in result_a: # print(item) b_1 = ["\n".join(item.split("\n")[1:6]) for item in b.split("\nSQL")] result_b = [] for item in b_1: if "|" in item: if "\nSelects" in item: result_b.append("result: \n" + "\n".join(item.split("\n")[0:-2]) + "\n\n") else: result_b.append("result: \n" + item + "\n") if len(result_a) != 26 or len(result_b) != 26: raise Exception("result'len should =26 and result_a'len should =26") result = ["\n".join(item) for item in list(zip(result_a, result_b))] result.insert(0, "d") result = result[0:-1:2][1:] print("\n".join(result))
二、解决线程 bug
我们知道,在 windows 中,通过任务管理器可以看CPU信息,比如下面是我在windows上的CPU信息截图:
现在问题来了,我的数据库系统到底是启用 8 线程好呢?还是 16线程好呢?
答案是 16 线程。以逻辑处理器为准!
数据库原先启用的是8线程,这就是问题所在。这块优化完毕之后,“Star Schema Benchmark 标准测试集优化”基本已终结。
三、测试结果
左边是咱们数据库,右边是 Starrocks 数据库