Hi,All 搭建了blink的ha,节点为:JM(node1,node2),TM(node3,node4,node5)但是启动后node1的进程就挂掉,node2的进程不能启动,报错如下:
node1的JobManager日志: ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed
ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved state handle is broken. Try cleaning the state handle store. at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:196) at org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:646) ................
node2的JobManager日志: ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint. org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the added job a5ffe00b0bc5688d9a7de5c62b8150e6 at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$onAddedJobGraph$31(Dispatcher.java:878) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) ................
TaskManager日志: ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down... java.lang.Exception: Reconnect to RM failed at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$closeResourceManagerConnection$3(TaskExecutor.java:1179) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) ................
flink-conf.yaml 配置: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.mb: 4096 taskmanager.heap.mb: 4096 taskmanager.numberOfTaskSlots: 2 parallelism.default: 6 taskmanager.managed.memory.size: 256 yarn.application-attempts: 10 env.java.home: /opt/jdk1.8.0_171/ fs.hdfs.hadoopconf: /app/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/etc/hadoop/ taskmanager.network.numberOfBuffers: 1024 high-availability: zookeeper high-availability.storageDir: hdfs://ip:8020/blink/ha/zookeeper/storageDir/ high-availability.zookeeper.quorum: ip:2181 high-availability.filesystem.path.jobgraphs: /app/blinkTmp/TaskTmp/jobgraphs/ state.backend: filesystem state.checkpoints.dir: hdfs://ip:8020/blink/flink-checkpoints state.backend.incremental: true rest.port: 8081
masters配置: node1:8081 node2:8081
slaves配置: node3 node4 node5
本人刚刚接触blink,我认为是我的配置有问题,大家有人体验了blink的安装部署么?配置能否发给我,我该怎样解决我的环境所出现的问题?
谢谢。*来自志愿者整理的flink邮件归档
注意到这一行
ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed
你的 ZK 是正常工作并且 blink 正确连接上了吗?*来自志愿者整理的flink
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。