开发者社区 问答 正文

Flink 1.11 submit job timed out

使用版本Flink 1.11,部署方式 kubernetes session。 TM个数30个,每个TM 4个slot。 job 并行度120.提交作业的时候出现大量的No hostname could be resolved for the IP address,JM time out,作业提交失败。web ui也会卡主无响应。

用wordCount,并行度只有1提交也会刷,no hostname的日志会刷个几条,然后正常提交,如果并行度一上去,就会超时。

部分日志如下:

2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.32.160.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:58:46,460 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.44.224.7, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted. 2020-07-15 16:58:46,461 WARN org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No hostname could be resolved for the IP address 10.40.32.9, using IP address as host name. Local input split assignment (such as for HDFS files) may be impacted.

2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - The heartbeat of JobManager with id 69a0d460de468888a9f41c770d963c0a timed out. 2020-07-15 16:59:10,236 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager 00000000000000000000000000000000@akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_2 for job e1554c737e37ed79688a15c746b6e9ef from the resource manager.

how to deal with ?*来自志愿者整理的flink邮件归档

展开
收起
说了是一只鲳鱼 2021-12-07 11:11:48 1028 分享 版权
1 条回答
写回答
取消 提交回答
  • 个人之前有遇到过 类似 的host解析问题,可以从k8s的pod节点网络映射角度排查一下。 希望这对你有帮助。*来自志愿者整理的flink邮件归档

    2021-12-07 11:27:57
    赞同 展开评论