执行Hive查询时出现OOM
写在前面
- Hive执行引擎:Hive on MR
报错:Error: Java heap space
- 原因:
内存分配问题
- 解决思路:
给map、reduce task分配合理的内存;map、reduce task处理合理的数据
- 当前集群
map task
分配的内存大小:
使用的是缺省参数每个task分配200M内存
「mapred.child.java.opts」
集群中每个节点:8 core / 32G,此处设置为:
mapred.child.java.opts = 3G
<property> <name>mapred.child.java.opts</name> <value>-Xmx3072m</value> </property>
- 调整map个数:
mapred.max.split.size=256000000
- 调整reduce个数:
hive.exec.reducers.bytes.per.reducer hive.exec.reducers.max
以下内容翻译自StackOverFlow
实验场景
在使用 TEZ 执行引擎从 Hive Shell 运行 Hive 查询时,我在日志中收到 java.lang.OutOfMemoryError: Java heap space error,但查询最终完成。
日志信息
ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 3, vertexId=vertex_1622153507491_0145_1_02, diagnostics=[Task failed, taskId=task_1622153507491_0145_1_02_000006, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:361) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async Initialization failed. abortRequested=false at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:465) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:399) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:342) ... 17 more Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261) at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMapStore.addMore(VectorMapJoinFastBytesHashMapStore.java:539) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.add(VectorMapJoinFastBytesHashMap.java:101) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringCommon.adaptPutRow(VectorMapJoinFastStringCommon.java:59) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.putRow(VectorMapJoinFastStringHashMap.java:37) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.putRow(VectorMapJoinFastTableContainer.java:183) at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:130) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215) at org.apache.hadoop.hive.ql.exec.MapJoinOperator$$Lambda$27/55723736.call(Unknown Source) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96) at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more
StckOverFlow的回答
加载 HashTable 时,在 MapJoin 运算符中出现 OOM 异常。也许没有mapjoin的替代路径已经成功,这就是它最终完成的原因。
你可以尝试以下方法:尝试增加 mapper 的并行度,如果你有更多的mapper并且 id 对这个错误解决起到作用,增加 mapper 内存。检查您当前的设置并进行相应的更改。
增加mapper并行度
(如果原因实际上是因为 Map Join 加载到内存中的表太大,这可能无济于事)。
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set tez.grouping.max-size=32000000; --减小 max-size 会增加并行度 set tez.grouping.min-size=32000; --如果您有小于 min-size 的小文件,mapper 将另外处理其他文件
增加mapper容器大小
(检查您当前的设置并相应增加)。这仅是示例:
set hive.tez.container.size=2048; --以MB为单位的容器大小 set hive.tez.java.opts=-Xmx1700m; --设置为 whive.tez.container.size 的 80%
尽量禁用Map端聚合
,Map端聚合会导致OOM
set hive.map.aggr=false;
检查 mapjoin 设置
,可能 smalltable 大小设置得太大,与您之前设置的容器大小进行比较:
[Hive Map-Join configuration mystery]
https://stackoverflow.com/questions/54726128/hive-map-join-configuration-mystery