Out of memory due to hash maps used in map-side aggregation解决办法

简介: 在运行一个group by的sql时,抛出以下错误信息:Task with the most failures(4): -----Task ID:  task_201411191723_723592_m_000004URL:  http://DDS0204.

在运行一个group by的sql时,抛出以下错误信息:

Task with the most failures(4): 

-----
Task ID:
  task_201411191723_723592_m_000004


URL:
  http://DDS0204.dratio:50030/taskdetails.jsp?jobid=job_201411191723_723592&tipid=task_201411191723_723592_m_000004


Possible error:
  Out of memory due to hash maps used in map-side aggregation.


Solution:
  Currently hive.map.aggr.hash.percentmemory is set to 0.25. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.125;'
-----
Diagnostic Messages for this Task:


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 12  Reduce: 1   Cumulative CPU: 164.04 sec   HDFS Read: 0 HDFS Write: 0 FAIL

Total MapReduce CPU Time Spent: 2 minutes 44 seconds 40 msec


原因是在map端进行了聚合,超过hash map的大小

终极解决办法:set hive.map.aggr=false 或者更改为子sql 或者尝试更改以下参数


备注:

与mapjoin和map aggregate相关的优化参数有:

①.hive.map.aggr 是否关闭关掉map端的aggregation,sethive.map.aggr=false就关闭map端的聚合了

②.hive.map.aggr.hash.min.reduction如果内存Map超过一定大小,就关闭MapAggregation功能,比如set hive.map.aggr.hash.min.reduction=0.5;

③.hive.map.aggr.hash.percentmemory

 当内存的Map大小,占到jsm配置的Map进程的25%(设置sethive.map.aggr.hash.percentmemory = 0.25)的时候(默认是50%),就将这个数据flush到reducer去,以释放内存Map的空间。

④.hive.groupby.skewindata数据据倾斜的时候进行负载均衡,当hive.groupby.skewindata=true,生成的查询计划会有两个 mr job。第一个mr中,每个map的输出结果集合会随机分布到reduce中,reduce做部分聚合操作。第二个mr再根据上个mr的数据结果按照group by key分布到 reduce中完成最终的聚合操作。

参考:

http://dev.bizo.com/2013/02/map-side-aggregations-in-apache-hive.html




目录
相关文章
|
6月前
Elasticsearch【问题记录 02】【不能以root运行es + max virtual memory areas vm.max_map_count [65530] is too low处理】
【4月更文挑战第12天】Elasticsearch【问题记录 02】【不能以root运行es + max virtual memory areas vm.max_map_count [65530] is too low处理】
54 3
|
6月前
|
前端开发
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
38 0
|
6月前
|
前端开发 数据库
返回参数不用实体类,用map返。resultType=“Map“,以及使用map不返回空的值解决办法,
返回参数不用实体类,用map返。resultType=“Map“,以及使用map不返回空的值解决办法,
136 1
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
177 0
|
Docker 容器
解决Native memory allocation (mmap) failed to map 2060255232 bytes for committing reserved memory.
解决Native memory allocation (mmap) failed to map 2060255232 bytes for committing reserved memory.
1180 0
|
存储 运维 Go
学习golang(3) 初探:编写一个链式hash map
学习golang(3) 初探:编写一个链式hash map
112 0
|
算法 Java
Java Map的Hash算法究竟干了什么?
Java Map的Hash算法究竟干了什么?
113 0
|
编解码 机器人 Android开发
Android10.0 OTA 错误解决办法(@/cache/recovery/block.map‘ failed)
Android10.0 OTA 错误解决办法(@/cache/recovery/block.map‘ failed)
674 0
|
C语言 Android开发 C++
Eclipse/NSight解决办法:unsolved inclusion stdio.h/map/string/queue/list
Eclipse/NSight解决办法:unsolved inclusion stdio.h/map/string/queue/list
121 0
Eclipse/NSight解决办法:unsolved inclusion stdio.h/map/string/queue/list
解决办法:error: ‘unordered_map’ in namespace ‘std’ does not name a template type
解决办法:error: ‘unordered_map’ in namespace ‘std’ does not name a template type
531 0