大数据线上问题排查系列 - HIVE 踩坑记- hive.metastore.dml.events

本文涉及的产品
日志服务 SLS,月写入数据量 50GB 1个月
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介: 大数据线上问题排查系列 - HIVE 踩坑记- hive.metastore.dml.events

前言

大家好,我是明哥!

本片博文是“大数据线上问题排查系列”大类别之一,讲述前段时间我司某产品在某券商遇到的一个问题及解决方案,其背后涉及到 hive 的一个 BUG,在 hive 3.0 才修复。以下是正文。

问题现象

cdh6.2.1中,开启 kerberos 和 sentry 的hive 中,使用 dml 语句 insert overwrite 插入数据到分区表的新分区时,会报错。同时注意到:

  • 查看底层 hdfs 发现对应的目录和数据都有了;
  • 使用 show partitions xxx 查看不到对应的新分区;
  • 使用 select 语句也查不到该新分区下的数据;
  • 只要是插入数据到新的分区,不管使用静态分区还是动态分区模式,都会报错;

问题分析

为确定问题,首先尝试了不同场景,发现如下现象:

  • 当使用 insert overwrite 插入数据到分区表的已有分区时,不会报错;(不管该分区下有没有数据);
  • 使用 insert into 插入数据到分区表的新分区时,不会报错;
  • 对于非分区表,insert overwrite 和 insert into 都不会报错;
  • 当使用insert overwrite 插入数据到分区表的新分区报错后,可以使用 msck repair test0317 修复hive metastore中相关元数据,修复后 select, show partitions 等语句可以查询到分区和该分区的底层数据,没有问题;

为进一步分析问题,常看了相关日志,包括客户端日志,hiveserver2 日志和 hive metastore 日志分别如下所示。

客户端日志(beeline)

INFO  : Loading data to table apollo_ods_jzfix.test0317 partition (ptdate=1) from hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1/.hive-staging_hive_2021-03-17_15-09-13_232_1543365768355672834-7333/-ext-10000
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
INFO  : MapReduce Jobs Launched: 
INFO  : Stage-Stage-1: Map: 1   Cumulative CPU: 2.56 sec   HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
INFO  : Total MapReduce CPU Time Spent: 2 seconds 560 msec
INFO  : Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event (state=08S01,code=1)

hive metastore 日志

2021-03-17 15:09:30,039 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM ip=10.2.91.100 cmd=get_table : db=apollo_ods_jzfix tbl=test0317 
2021-03-17 15:09:30,044 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-9-thread-122]: 121: get_partition : db=apollo_ods_jzfix tbl=test0317[1]
2021-03-17 15:09:30,044 INFO  org.apache.hadoop.hive.metastore.HiveMetaStore.audit: [pool-9-thread-122]: ugi=hive/dev-dw-nn01@GYDW.COM ip=10.2.91.100 cmd=get_partition : db=apollo_ods_jzfix tbl=test0317[1] 
2021-03-17 15:09:30,053 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-9-thread-122]: NoSuchObjectException(message:partition values=[1])
 at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003)
 at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101)
 at com.sun.proxy.$Proxy26.getPartition(Unknown Source)
 at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553)
 at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62)
 at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737)
 at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
 at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
 at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source)
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208)
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594)
 at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
 at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589)
 at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2021-03-17 15:09:30,054 ERROR org.apache.thrift.ProcessFunction: [pool-9-thread-122]: Internal error processing fire_listener_event
org.apache.hadoop.hive.metastore.api.NoSuchObjectException: partition values=[1]
 at org.apache.hadoop.hive.metastore.ObjectStore.getPartition(ObjectStore.java:2003) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at sun.reflect.GeneratedMethodAccessor97.invoke(Unknown Source) ~[?:?]
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
 at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:101) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at com.sun.proxy.$Proxy26.getPartition(Unknown Source) ~[?:?]
 at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition(HiveMetaStore.java:3553) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.events.InsertEvent.<init>(InsertEvent.java:62) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.fire_listener_event(HiveMetaStore.java:6737) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source) ~[?:?]
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
 at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at com.sun.proxy.$Proxy28.fire_listener_event(Unknown Source) ~[?:?]
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14208) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$fire_listener_event.getResult(ThriftHiveMetastore.java:14193) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:594) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
 at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
 at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:589) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

hiveserver2 日志

2021-03-17 15:09:30,016 INFO  org.apache.hadoop.hive.ql.exec.MoveTask: [HiveServer2-Background-Pool: Thread-183022]: Partition is: {ptdate=1}
2021-03-17 15:09:30,033 INFO  org.apache.hadoop.hive.common.FileUtils: [HiveServer2-Background-Pool: Thread-183022]: Creating directory if it doesn't exist: hdfs://dev-dw-nn01:8020/user/hive/warehouse/apollo_ods_jzfix.db/test0317/ptdate=1
2021-03-17 15:09:30,057 WARN  org.apache.hadoop.hive.metastore.RetryingMetaStoreClient: [HiveServer2-Background-Pool: Thread-183022]: MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
 at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
 at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?]
 at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
 at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) [?:?]
 at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
 at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
 at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2021-03-17 15:09:31,063 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Closed a connection to metastore, current connections: 9231
2021-03-17 15:09:31,063 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Trying to connect to metastore with URI thrift://dev-dw-dn01:9083
2021-03-17 15:09:31,065 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Opened a connection to metastore, current connections: 9232
2021-03-17 15:09:31,065 INFO  hive.metastore: [HiveServer2-Background-Pool: Thread-183022]: Connected to metastore.
2021-03-17 15:09:31,196 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_lis
tener_event2021-03-17 15:09:31,196 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: MapReduce Jobs Launched: 
2021-03-17 15:09:31,197 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Stage-Stage-1: Map: 1   Cumulative CPU: 2.56 sec   HDFS Read: 5344 HDFS Write: 609 HDFS EC Read: 0 SUCCESS
2021-03-17 15:09:31,197 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Total MapReduce CPU Time Spent: 2 seconds 560 msec
2021-03-17 15:09:31,197 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-183022]: Completed executing command(queryId=hive_20210317150913_9d734c54-f0cf-4dc7-9117-bc7f59c2cb61); Time taken: 17.758 seconds
2021-03-17 15:09:31,206 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-183022]: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
 at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:329) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:258) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
 at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_181]
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) [hadoop-common-3.0.0-cdh6.2.1.jar:?]
 at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357) [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
 at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2433) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 ... 11 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing fire_listener_event
 at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4836) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4823) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:2531) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
 at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?]
 at sun.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[?:?]
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
 at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
 at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at com.sun.proxy.$Proxy34.fireListenerEvent(Unknown Source) ~[?:?]
 at org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:2431) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1629) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1525) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1489) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:501) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
 at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) ~[hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]

问题临时解决方法

通过以上不同场景下的尝试和相关日志,可以确定这应该是某些配置项引起的 HIVE 的一个 BUG,但笔者当时并没有发现相关的JIRA。为了快速推进项目,笔者当时提出了以下 workaround:

  1. 业务测问题解决方法一:在 insert overwrite 后 (会报错),增加脚本执行命令 msck repair test0317 修复hive metastore 中相关元数据: 该方法的缺点是不太方便嵌入到调度流程中,因insert overwrite报错后,调度工具就不会继续调度执行后续流程了,无法做到业务代码不侵入;
  2. 业务测问题解决方法二:在执行正常业务流程前,添加一个前置流程,该前置流程的内容是提前创建好正常流程中需要的新分区,其内容实例如下:alter table test0317 add partition (ptdate=10); (ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec;

最终为快速推进项目,我们采用了上述方案2临时解决了问题。

问题解决方法

在有了临时解决方案,确保了项目能顺利推进的同时,为彻底了解背后技术细节,笔者也在 cloudera 社区发了帖子寻求支持,但社区的相应速度没有那么及时,前几日才收到回馈,截图如下:

image.png


image.png

根据 cloudra 官方的回复,参考上述 jira 链接,并对照源码,确认这是 HIVE 的一个BUG,且在 HIVE 3.0 才修复完毕,其主要问题概括如下:

在 hive.metastore.dml.events 配置为 true时,向不存在的分区插入数据,比如向新分区 insert overwrite,或以动态分区模式插入数据到新分区时,hive原先的处理有问题,会因为 metastore 找不到分区而断开链接并报错。

所以,在上述业务侧可以采取的方案二之外,还可以采取如下解决方案:

  1. 配置 hive.metastore.dml.events 为false
  2. 或升级到hive3.0.
相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps&nbsp;
相关文章
|
3月前
|
SQL 分布式计算 Java
大数据-96 Spark 集群 SparkSQL Scala编写SQL操作SparkSQL的数据源:JSON、CSV、JDBC、Hive
大数据-96 Spark 集群 SparkSQL Scala编写SQL操作SparkSQL的数据源:JSON、CSV、JDBC、Hive
72 0
|
6月前
|
弹性计算 分布式计算 DataWorks
MaxCompute操作报错合集之运行pyodps报错超时,该如何排查
MaxCompute是阿里云提供的大规模离线数据处理服务,用于大数据分析、挖掘和报表生成等场景。在使用MaxCompute进行数据处理时,可能会遇到各种操作报错。以下是一些常见的MaxCompute操作报错及其可能的原因与解决措施的合集。
|
6月前
|
分布式计算 自然语言处理 大数据
MaxCompute操作报错合集之使用pyodps读取全表(百万级),然后对其中某列apply自己定义的分词函数,遇到报错,该如何排查
MaxCompute是阿里云提供的大规模离线数据处理服务,用于大数据分析、挖掘和报表生成等场景。在使用MaxCompute进行数据处理时,可能会遇到各种操作报错。以下是一些常见的MaxCompute操作报错及其可能的原因与解决措施的合集。
|
6月前
|
SQL 分布式计算 大数据
大数据处理平台Hive详解
【7月更文挑战第15天】Hive作为基于Hadoop的数据仓库工具,在大数据处理和分析领域发挥着重要作用。通过提供类SQL的查询语言,Hive降低了数据处理的门槛,使得具有SQL背景的开发者可以轻松地处理大规模数据。然而,Hive也存在查询延迟高、表达能力有限等缺点,需要在实际应用中根据具体场景和需求进行选择和优化。
|
6月前
|
机器学习/深度学习 分布式计算 DataWorks
MaxCompute产品使用合集之如果生产环境少了数据,该如何排查
MaxCompute作为一款全面的大数据处理平台,广泛应用于各类大数据分析、数据挖掘、BI及机器学习场景。掌握其核心功能、熟练操作流程、遵循最佳实践,可以帮助用户高效、安全地管理和利用海量数据。以下是一个关于MaxCompute产品使用的合集,涵盖了其核心功能、应用场景、操作流程以及最佳实践等内容。
|
6月前
|
分布式计算 DataWorks 调度
MaxCompute产品使用合集之如何将数据迁移到CDH Hive
MaxCompute作为一款全面的大数据处理平台,广泛应用于各类大数据分析、数据挖掘、BI及机器学习场景。掌握其核心功能、熟练操作流程、遵循最佳实践,可以帮助用户高效、安全地管理和利用海量数据。以下是一个关于MaxCompute产品使用的合集,涵盖了其核心功能、应用场景、操作流程以及最佳实践等内容。
117 0
|
6月前
|
分布式计算 DataWorks NoSQL
MaxCompute产品使用合集之数据总线同步到DataWorks的任务状态持续显示为HANG(挂起)且同步延迟不断增加,该如何排查
MaxCompute作为一款全面的大数据处理平台,广泛应用于各类大数据分析、数据挖掘、BI及机器学习场景。掌握其核心功能、熟练操作流程、遵循最佳实践,可以帮助用户高效、安全地管理和利用海量数据。以下是一个关于MaxCompute产品使用的合集,涵盖了其核心功能、应用场景、操作流程以及最佳实践等内容。
|
7月前
|
SQL 数据采集 数据可视化
基于Hive的招聘网站的大数据分析系统
基于Hive的招聘网站的大数据分析系统
153 2
|
8月前
|
SQL 数据采集 数据挖掘
大数据行业应用之Hive数据分析航班线路相关的各项指标
大数据行业应用之Hive数据分析航班线路相关的各项指标
213 1
|
8月前
|
SQL 存储 大数据
【大数据技术Hadoop+Spark】Hive基础SQL语法DDL、DML、DQL讲解及演示(附SQL语句)
【大数据技术Hadoop+Spark】Hive基础SQL语法DDL、DML、DQL讲解及演示(附SQL语句)
278 0