使用hdfs保存checkpoint一段时间后报错-问答-阿里云开发者社区-阿里云

开发者社区> 问答> 正文
阿里云
为了无法计算的价值
打开APP
阿里云APP内打开

使用hdfs保存checkpoint一段时间后报错

2021-12-07 14:00:09 302 1

使用hdfs保存checkpoint一段时间后报错,自动重启后正常运行一段时间后继续报同样的错

Caused by: java.io.IOException: Could not flush and close the file system output stream to hdfs://master:9000/flink/checkpoints/PaycoreContextHopJob/cbb3a580d0323fbace80e71a25c966d0/chk-11352/fc4b8b08-2c32-467c-a1f4-f384eba246ff in order to obtain the stream state handle at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:326) at org.apache.flink.runtime.state.CheckpointStreamWithResultProvider$PrimaryStreamOnly.closeAndFinalizeCheckpointStreamResult(CheckpointStreamWithResultProvider.java:77) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.callInternal(HeapKeyedStateBackend.java:765) at org.apache.flink.runtime.state.heap.HeapKeyedStateBackend$HeapSnapshotStrategy$1.callInternal(HeapKeyedStateBackend.java:724) at org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:76) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50) ... 7 more Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /flink/checkpoints/PaycoreContextHopJob/cbb3a580d0323fbace80e71a25c966d0/chk-11352/fc4b8b08-2c32-467c-a1f4-f384eba246ff could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1726) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2567) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489) at org.apache.hadoop.ipc.Client.call(Client.java:1435) at org.apache.hadoop.ipc.Client.call(Client.java:1345) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy17.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:444) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) at com.sun.proxy.$Proxy18.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1838) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1638) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:704)*来自志愿者整理的flink邮件归档

取消 提交回答
全部回答(1)
  • 雪哥哥
    2021-12-07 15:26:21

    核心原因是HDFS的问题 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /flink/checkpoints/PaycoreContextHopJob/cbb3a580d0323fbace80e71a25c966d0/chk-11352/fc4b8b08-2c32-467c-a1f4-f384eba246ff could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation.

    在出现问题的时候,观察一下集群HDFS的情况,以及相关的日志。 也许这个stackoverflow的回答[1] 能帮助到你。

    [1] https://stackoverflow.com/questions/36015864/hadoop-be-replicated-to-0-nodes-instead-of-minreplication-1-there-are-1/36310025https://stackoverflow.com/questions/36015864/hadoop-be-replicated-to-0-nodes-instead-of-minreplication-1-there-are-1/36310025*来自志愿者整理的flink

    0 0
相关问答

0

回答

使用flink cdc 读取mysql 日志,一段时间后获取不到新的日志了,这个那位大佬碰到过

2022-09-20 07:27:24 23浏览量 回答数 0

1

回答

请问有大佬碰到过这种情况吗?checkpoint 失败后,就不再尝试checkpoint,但是任务正

2022-08-24 18:09:55 300浏览量 回答数 1

1

回答

HDFS 中的 block 默认保存几份?

2021-12-05 12:42:50 280浏览量 回答数 1

1

回答

非java程序访问HDFS是什么意思?

2021-12-04 22:15:12 332浏览量 回答数 1

13

回答

【hi聊】2020年5月程序员工资平均14542元,你有什么加薪妙招?

2020-05-11 10:11:10 54685浏览量 回答数 13

1

回答

CDN访问日志的分析方法有哪些?

2020-03-30 15:23:05 425浏览量 回答数 1

1

回答

CDN访问日志的分析方法有哪些?

2020-03-30 14:52:23 222浏览量 回答数 1

1

回答

如何在ECI中访问HDFS的数据?

2020-03-20 18:40:39 674浏览量 回答数 1

1

回答

看JS高级程序时遇到这样一段html代码,能解释下么

2016-06-17 12:59:35 1571浏览量 回答数 1

2

回答

请问阿里云CDN能缓存伪静态页面吗?

2015-07-22 09:54:36 9286浏览量 回答数 2
+关注
0
文章
874
问答
问答排行榜
最热
最新
相关电子书
更多
低代码开发师(初级)实战教程
立即下载
阿里巴巴DevOps 最佳实践手册
立即下载
冬季实战营第三期:MySQL数据库进阶实战
立即下载