带你读《2022技术人的百宝黑皮书》——ODPS SQL优化总结(8) https://developer.aliyun.com/article/1247141?groupCode=taobaotech
解决步骤
参数调优:增加map、reduce个数,执行效率没有明显的提升。
分桶:测试使用hash clustering解决group/join缓慢的问题。
1、创建测试表 create table tmp_zhangtao_test_hash_range like dws_tm_brd_pwr_deep_brd_usr_cat_1d LIFECY CLE 2; 2、查看测试表结构 desc mkt.tmp_zhangtao_test_hash_range; 3、修改测试表支持桶;测试时发现user_id倾斜情况不太严重 ALTER TABLE tmp_zhangtao_test_hash_range CLUSTERED BY (user_id) SORTED by ( user_id) INTO 1024 BUCKETS; 4、插入数据,这里发现多了一个1024个任务的reduce。 insert OVERWRITE table mkt.tmp_zhangtao_test_hash_range partition(ds,cate_flag) SELECT brand_id, cate_id, user_id, shop_type, deep_score, brd_ord_amt, discovery_score, engagement_score, enthusiasm_score, itm_sty_tme, itm_vst_cnt, liv_sty_tme, liv_vst_cnt, vdo_sty_tme, vdo_vst_cnt, img_txt_sty_tme, img_txt_vst_cnt, col_cnt_ufm, crt_cnt_ufm, sch_cnt_ufm, mkt_iat_cnt, fan_flw_cnt, fst_itm_sty_tme, fst_itm_vst_cnt, col_cnt_fm, crt_cnt_fm, sch_cnt_fm, shr_cnt, cmt_cnt, pvt_iat_cnt, ds, cate_flag FROM dws_tm_brd_pwr_deep_brd_usr_cat_1d WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bizdate}', 'yyyymmdd'), -89, 'dd'), 'yyyymmdd'); 5、查询数据性能比对 SELECT cate_id ,shop_type ,user_id ,SUM(deep_score) AS deep_score ,SUM(brd_ord_amt) AS brd_ord_amt ,SUM(discovery_score) AS discovery_score ,SUM(engagement_score) AS engagement_score ,SUM(enthusiasm_score) AS enthusiasm_score ,SUM(itm_sty_tme) AS itm_sty_tme ,SUM(itm_vst_cnt) AS itm_vst_cnt ,SUM(liv_sty_tme) AS liv_sty_tme ,SUM(liv_vst_cnt) AS liv_vst_cnt ,SUM(vdo_sty_tme) AS vdo_sty_tme ,SUM(vdo_vst_cnt) AS vdo_vst_cnt ,SUM(img_txt_sty_tme) AS img_txt_sty_tme ,SUM(img_txt_vst_cnt) AS img_txt_vst_cnt ,SUM(col_cnt_ufm) AS col_cnt_ufm ,SUM(crt_cnt_ufm) AS crt_cnt_ufm ,SUM(sch_cnt_ufm) AS sch_cnt_ufm ,SUM(mkt_iat_cnt) AS mkt_iat_cnt ,SUM(fan_flw_cnt) AS fan_flw_cnt ,SUM(fst_itm_sty_tme) AS fst_itm_sty_tme ,SUM(fst_itm_vst_cnt) AS fst_itm_vst_cnt ,SUM(col_cnt_fm) AS col_cnt_fm ,SUM(crt_cnt_fm) AS crt_cnt_fm ,SUM(sch_cnt_fm) AS sch_cnt_fm ,SUM(shr_cnt) AS shr_cnt ,SUM(cmt_cnt) AS cmt_cnt ,SUM(pvt_iat_cnt) AS pvt_iat_cnt FROM dws_tm_brd_pwr_deep_usr_cat_1d/tmp_zhangtao_test_hash_range WHERE ds = TO_CHAR(DATEADD(TO_DATE('${bizdate}', 'yyyymmdd'), -89, 'dd'), 'yyyymmdd') AND cate_flag = '1' GROUP BY cate_id ,shop_type ,user_id
查询结果:
使用hash clustering ,map数和桶个数相同。
Summary: resource cost: cpu 0.34 Core * Min, memory 0.61 GB * Min
不使用hash clustering:
resource cost: cpu 175.85 Core * Min, memory 324.24 GB * Min
带你读《2022技术人的百宝黑皮书》——ODPS SQL优化总结(10) https://developer.aliyun.com/article/1247139?groupCode=taobaotech