hive.exec.parallel参数

简介:

hive.exec.parallel参数控制在同一个sql中的不同的job是否可以同时运行,默认为false.
下面是对于该参数的测试过程:

测试sql:
select r1.a
from (select t.a from sunwg_10 t join sunwg_10000000 s on t.a=s.b) r1 join (select s.b from sunwg_100000 t join sunwg_10 s on t.a=s.b) r2 on (r1.a=r2.b);

1,
Set hive.exec.parallel=false;
当参数为false的时候,三个job是顺序的执行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
hive> set  hive. exec .parallel= false ;
hive> select  r1.a
     > from  ( select  t.a from  sunwg_10 t join  sunwg_10000000 s on  t.a=s.b) r1 join  ( select  s.b from  sunwg_100000 t join  sunwg_10 s on  t.a=s.b) r2 on  (r1.a=r2.b);
Total MapReduce jobs = 3
Launching Job 1 out  of  3
Number of  reduce tasks not  specified. Estimated from  input data size : 1
In  order  to  change the average load  for  a reducer ( in  bytes):
   set  hive. exec .reducers.bytes.per.reducer=<number>
In  order  to  limit the maximum number of  reducers:
   set  hive. exec .reducers. max =<number>
In  order  to  set  a constant number of  reducers:
   set  mapred.reduce.tasks=<number>
Cannot run job locally: Input Size  (= 397778060) is  larger than hive. exec .mode. local .auto.inputbytes. max  (= -1)
Starting Job = job_201208241319_2001905, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001905
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001905
Hadoop job information for  Stage-1: number of  mappers: 7; number of  reducers: 1
2012-09-07 17:55:40,854 Stage-1 map = 0%,  reduce = 0%
2012-09-07 17:55:55,663 Stage-1 map = 14%,  reduce = 0%
2012-09-07 17:56:00,506 Stage-1 map = 56%,  reduce = 0%
2012-09-07 17:56:10,254 Stage-1 map = 100%,  reduce = 0%
2012-09-07 17:56:19,871 Stage-1 map = 100%,  reduce = 29%
2012-09-07 17:56:30,000 Stage-1 map = 100%,  reduce = 75%
2012-09-07 17:56:34,799 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2001905
Launching Job 2 out  of  3
Number of  reduce tasks not  specified. Estimated from  input data size : 1
In  order  to  change the average load  for  a reducer ( in  bytes):
   set  hive. exec .reducers.bytes.per.reducer=<number>
In  order  to  limit the maximum number of  reducers:
   set  hive. exec .reducers. max =<number>
In  order  to  set  a constant number of  reducers:
   set  mapred.reduce.tasks=<number>
Cannot run job locally: Input Size  (= 3578060) is  larger than hive. exec .mode. local .auto.inputbytes. max  (= -1)
Starting Job = job_201208241319_2002054, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2002054
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2002054
Hadoop job information for  Stage-4: number of  mappers: 2; number of  reducers: 1
2012-09-07 17:56:43,343 Stage-4 map = 0%,  reduce = 0%
2012-09-07 17:56:48,124 Stage-4 map = 50%,  reduce = 0%
2012-09-07 17:56:55,816 Stage-4 map = 100%,  reduce = 0%
Ended Job = job_201208241319_2002054
Launching Job 3 out  of  3
Number of  reduce tasks not  specified. Estimated from  input data size : 1
In  order  to  change the average load  for  a reducer ( in  bytes):
   set  hive. exec .reducers.bytes.per.reducer=<number>
In  order  to  limit the maximum number of  reducers:
   set  hive. exec .reducers. max =<number>
In  order  to  set  a constant number of  reducers:
   set  mapred.reduce.tasks=<number>
Cannot run job locally: Input Size  (= 596) is  larger than hive. exec .mode. local .auto.inputbytes. max  (= -1)
Starting Job = job_201208241319_2002120, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2002120
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2002120
Hadoop job information for  Stage-2: number of  mappers: 2; number of  reducers: 1
2012-09-07 17:57:12,641 Stage-2 map = 0%,  reduce = 0%
2012-09-07 17:57:19,571 Stage-2 map = 50%,  reduce = 0%
2012-09-07 17:57:25,199 Stage-2 map = 100%,  reduce = 0%
2012-09-07 17:57:29,210 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2002120
OK
abcdefghijk_0
abcdefghijk_1
abcdefghijk_2
abcdefghijk_3
abcdefghijk_4
abcdefghijk_5
abcdefghijk_6
abcdefghijk_7
abcdefghijk_8
abcdefghijk_9
Time  taken: 135.944 seconds

2,
但是可以看出来其实两个子查询中的sql并无关系,可以并行的跑

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
hive> set  hive. exec .parallel= true ;
hive> select  r1.a
     > from  ( select  t.a from  sunwg_10 t join  sunwg_10000000 s on  t.a=s.b) r1 join  ( select  s.b from  sunwg_100000 t join  sunwg_10 s on  t.a=s.b) r2 on  (r1.a=r2.b);
Total MapReduce jobs = 3
Launching Job 1 out  of  3
Launching Job 2 out  of  3
Number of  reduce tasks not  specified. Estimated from  input data size : 1
In  order  to  change the average load  for  a reducer ( in  bytes):
   set  hive. exec .reducers.bytes.per.reducer=<number>
In  order  to  limit the maximum number of  reducers:
   set  hive. exec .reducers. max =<number>
In  order  to  set  a constant number of  reducers:
   set  mapred.reduce.tasks=<number>
Cannot run job locally: Input Size  (= 397778060) is  larger than hive. exec .mode. local .auto.inputbytes. max  (= -1)
Number of  reduce tasks not  specified. Estimated from  input data size : 1
In  order  to  change the average load  for  a reducer ( in  bytes):
   set  hive. exec .reducers.bytes.per.reducer=<number>
In  order  to  limit the maximum number of  reducers:
   set  hive. exec .reducers. max =<number>
In  order  to  set  a constant number of  reducers:
   set  mapred.reduce.tasks=<number>
Cannot run job locally: Input Size  (= 3578060) is  larger than hive. exec .mode. local .auto.inputbytes. max  (= -1)
Starting Job = job_201208241319_2001452, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001452
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001452
Starting Job = job_201208241319_2001453, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001453
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001453
Hadoop job information for  Stage-4: number of  mappers: 2; number of  reducers: 1
Hadoop job information for  Stage-1: number of  mappers: 7; number of  reducers: 1
2012-09-07 17:52:10,558 Stage-4 map = 0%,  reduce = 0%
2012-09-07 17:52:10,588 Stage-1 map = 0%,  reduce = 0%
2012-09-07 17:52:22,827 Stage-1 map = 14%,  reduce = 0%
2012-09-07 17:52:22,880 Stage-4 map = 100%,  reduce = 0%
2012-09-07 17:52:27,678 Stage-1 map = 22%,  reduce = 0%
2012-09-07 17:52:28,701 Stage-1 map = 36%,  reduce = 0%
2012-09-07 17:52:31,137 Stage-1 map = 93%,  reduce = 0%
2012-09-07 17:52:33,551 Stage-1 map = 100%,  reduce = 0%
2012-09-07 17:52:36,427 Stage-4 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2001453
2012-09-07 17:52:42,883 Stage-1 map = 100%,  reduce = 33%
2012-09-07 17:52:45,431 Stage-1 map = 100%,  reduce = 70%
2012-09-07 17:52:47,526 Stage-1 map = 100%,  reduce = 76%
2012-09-07 17:52:51,829 Stage-1 map = 100%,  reduce = 84%
Ended Job = job_201208241319_2001452
Launching Job 3 out  of  3
Number of  reduce tasks not  specified. Estimated from  input data size : 1
In  order  to  change the average load  for  a reducer ( in  bytes):
   set  hive. exec .reducers.bytes.per.reducer=<number>
In  order  to  limit the maximum number of  reducers:
   set  hive. exec .reducers. max =<number>
In  order  to  set  a constant number of  reducers:
   set  mapred.reduce.tasks=<number>
Cannot run job locally: Input Size  (= 596) is  larger than hive. exec .mode. local .auto.inputbytes. max  (= -1)
Starting Job = job_201208241319_2001621, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2001621
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job  -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2001621
Hadoop job information for  Stage-2: number of  mappers: 2; number of  reducers: 1
2012-09-07 17:53:07,081 Stage-2 map = 0%,  reduce = 0%
2012-09-07 17:53:10,351 Stage-2 map = 50%,  reduce = 0%
2012-09-07 17:53:11,380 Stage-2 map = 100%,  reduce = 0%
2012-09-07 17:53:18,132 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_201208241319_2001621
OK
abcdefghijk_0
abcdefghijk_1
abcdefghijk_2
abcdefghijk_3
abcdefghijk_4
abcdefghijk_5
abcdefghijk_6
abcdefghijk_7
abcdefghijk_8
abcdefghijk_9
Time  taken: 108.301 seconds

总结:
在资源充足的时候hive.exec.parallel会让那些存在并发job的sql运行得更快,但同时消耗更多的资源
可以评估下hive.exec.parallel对我们的刷新任务是否有帮助.

转自 :http://www.oratea.net/?p=1377



本文转自茄子_2008博客园博客,原文链接:http://www.cnblogs.com/xd502djj/archive/2013/05/08/3067699.html,如需转载请自行联系原作者。


目录
相关文章
|
11月前
|
SQL 分布式计算 Hadoop
55 Hive Shell参数
55 Hive Shell参数
48 0
|
SQL 分布式计算 算法
Hive关联时丢失数据问题和常用的Hive SQL参数设置
针对结果的发生,本文从以下方面分析原因及提供解决方案: - 右表没有匹配的数据 - 关联键数据类型不匹配 - 受count列null值影响 - Hive版本问题,在某些版本中,左连可能导致右表为null - 数据倾斜 并在文末附属了`Hive SQL常用参数设置`的说明。
Hive关联时丢失数据问题和常用的Hive SQL参数设置
|
5月前
|
SQL 分布式计算 资源调度
一文看懂 Hive 优化大全(参数配置、语法优化)
以下是对提供的内容的摘要,总长度为240个字符: 在Hadoop集群中,服务器环境包括3台机器,分别运行不同的服务,如NodeManager、DataNode、NameNode等。集群组件版本包括jdk 1.8、mysql 5.7、hadoop 3.1.3和hive 3.1.2。文章讨论了YARN的配置优化,如`yarn.nodemanager.resource.memory-mb`、`yarn.nodemanager.vmem-check-enabled`和`hive.map.aggr`等参数,以及Map-Side聚合优化、Map Join和Bucket Map Join。
|
5月前
|
SQL Java Shell
Hive【非交互式使用、三种参数配置方式】
Hive【非交互式使用、三种参数配置方式】
|
SQL 存储 分布式计算
大数据Hive参数配置
大数据Hive参数配置
144 0
|
SQL 分布式计算 负载均衡
如何从语法与参数层面对Hive进行调优
作为企业Hadoop应用的核心产品,Hive承载着FaceBook、淘宝等大佬95%以上的离线统计,很多企业里的离线统计甚至全由Hive完成,如电商、金融等行业。Hive在企业云计算平台发挥的作用和影响愈来愈大。因此,如何优化提速已经显得至关重要。
|
SQL 负载均衡 Java
【Hive】(十九)Hive 常用参数优化汇总
【Hive】(十九)Hive 常用参数优化汇总
286 0
|
SQL 分布式计算 HIVE
Hive----优化参数
优化参数
414 0
|
SQL 分布式计算 负载均衡
hive 参数设置大全
hive 参数设置大全
|
SQL 分布式计算 Java
Apache Hive--命令行&amp;参数配置方式| 学习笔记
快速学习 Apache Hive--命令行&amp;参数配置方式
124 0