开发者社区> 问答> 正文

blink ha,进程启动就挂掉

Hi,All 搭建了blink的ha,节点为:JM(node1,node2),TM(node3,node4,node5)但是启动后node1的进程就挂掉,node2的进程不能启动,报错如下:

node1的JobManager日志: ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed

ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: Could not retrieve submitted JobGraph from state handle under /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved state handle is broken. Try cleaning the state handle store. at org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:196) at org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:646) ................

node2的JobManager日志: ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error occurred in the cluster entrypoint. org.apache.flink.runtime.dispatcher.DispatcherException: Could not start the added job a5ffe00b0bc5688d9a7de5c62b8150e6 at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$onAddedJobGraph$31(Dispatcher.java:878) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) ................

TaskManager日志: ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Fatal error occurred while executing the TaskManager. Shutting it down... java.lang.Exception: Reconnect to RM failed at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$closeResourceManagerConnection$3(TaskExecutor.java:1179) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158) ................

flink-conf.yaml 配置: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.mb: 4096 taskmanager.heap.mb: 4096 taskmanager.numberOfTaskSlots: 2 parallelism.default: 6 taskmanager.managed.memory.size: 256 yarn.application-attempts: 10 env.java.home: /opt/jdk1.8.0_171/ fs.hdfs.hadoopconf: /app/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/etc/hadoop/ taskmanager.network.numberOfBuffers: 1024 high-availability: zookeeper high-availability.storageDir: hdfs://ip:8020/blink/ha/zookeeper/storageDir/ high-availability.zookeeper.quorum: ip:2181 high-availability.filesystem.path.jobgraphs: /app/blinkTmp/TaskTmp/jobgraphs/ state.backend: filesystem state.checkpoints.dir: hdfs://ip:8020/blink/flink-checkpoints state.backend.incremental: true rest.port: 8081

masters配置: node1:8081 node2:8081

slaves配置: node3 node4 node5

本人刚刚接触blink,我认为是我的配置有问题,大家有人体验了blink的安装部署么?配置能否发给我,我该怎样解决我的环境所出现的问题?

谢谢。*来自志愿者整理的flink邮件归档

展开
收起
毛毛虫雨 2021-12-07 12:47:35 1472 0
1 条回答
写回答
取消 提交回答
  • 注意到这一行

    ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed

    你的 ZK 是正常工作并且 blink 正确连接上了吗?*来自志愿者整理的flink

    2021-12-07 15:28:39
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
服务上云加速大家居产业C2M进程 立即下载
低代码开发师(初级)实战教程 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载

相关实验场景

更多