开发者社区> 问答> 正文

flink on yarn 启动任务失败

目前整体采用flink on yarn ha 部署,flink版本为社区版1.7.2,hadoop版本为社区版2.8.5

目前总共有5台flink集群,每台服务器CPU4核,内存8G

flink基本配置为 jobmanager.heap.size: 2048m taskmanager.heap.size: 2048m taskmanager.numberOfTaskSlots: 4

采用run a job on flink 启动任务,现在每个任务一个并行度 命令如 flink run -d -m yarn-cluster ...

当发布两个任务成功后,第三个任务就启动不了 部分启动日志如下 360 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1554100483755_0013 2019-04-04 16:24:23,389 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1554100483755_0013 2019-04-04 16:24:23,389 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2019-04-04 16:24:23,390 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2019-04-04 16:25:23,625 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2019-04-04 16:25:23,876 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2019-04-04 16:25:24,127 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2019-04-04 16:25:24,378 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster

其他找不到任何跟踪信息,查看yarn 控台后,发现容器分配不了,页面上的信息如下 YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

Diagnostics: [Thu Apr 04 16:33:49 +0800 2019] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:4096, vCores:1>; User AM Resource Limit of the queue = <memory:4096, vCores:1>; Queue AM Resource Usage = <memory:4096, vCores:2>;

1.按照上面的机器划分跟启动设置并行度,还有yarn控台节点查看,还有很多内存跟CPU没有使用到, 为什么会出现这种情况,是还需要什么配置吗? 2.对于上面几个基本配置,jobmanager.heap.size,taskmanager.heap.size,taskmanager.numberOfTaskSlots有什么设置注意点吗? 一般要怎么设置?我现在发现这种启动模式下,每个任务都会有一个jobmanager跟一个taskmanger*来自志愿者整理的flink邮件归档

展开
收起
毛毛虫雨 2021-12-07 14:08:58 2181 0
1 条回答
写回答
取消 提交回答
  • Hi, “Queue's AM resource limit exceeded” -> 这个应该是 YARN 对 AM 的使用资源进行了限制吧,上限是 4096M 内存?你启动的应该是 job mode 吧,每个 job 都会启动单独的 AM,每个 AM 占用 2048M 内存?如果按这样算的话确实只够启动两个*来自志愿者整理的flink

    2021-12-07 15:23:33
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
深度学习+大数据 TensorFlow on Yarn 立即下载
Docker on Yarn 微服务实践 立即下载
深度学习+大数据-TensorFlow on Yarn 立即下载