开发者社区> 问答> 正文

Flink jobmanager报错: ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxxx:42895] with UID [-205381938] irrecoverably failed. Quarantining address

我们的flink程序运行一段时间之后jobmanager就会有这种错误信息,但是taskmanager并没有报错信息,网上也查了这个报错信息,但是得不到准确的答案和解决办法,请赐教有没有好的解决办法,谢谢!
2018-11-18 23:24:09,594 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:46671] with UID [-435340187] irrecoverably failed. Quarantining address.
java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)

at akka.remote.ReliableDeliverySupervisor

$$ anonfun$idle$1.applyOrElse(Endpoint.scala:375) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2018-11-18 23:24:09,596 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:44873] with UID [-1963170243] irrecoverably failed. Quarantining address. java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours) at akka.remote.ReliableDeliverySupervisor $$

anonfun$idle$1.applyOrElse(Endpoint.scala:375)

at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2018-11-18 23:24:09,609 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:57996] with UID [-1875995170] irrecoverably failed. Quarantining address.
java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)

at akka.remote.ReliableDeliverySupervisor

$$ anonfun$idle$1.applyOrElse(Endpoint.scala:375) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2018-11-18 23:24:09,612 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:36114] with UID [-1372891511] irrecoverably failed. Quarantining address. java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours) at akka.remote.ReliableDeliverySupervisor $$

anonfun$idle$1.applyOrElse(Endpoint.scala:375)

at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2018-11-18 23:24:09,615 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:42895] with UID [-205381938] irrecoverably failed. Quarantining address.
java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)

at akka.remote.ReliableDeliverySupervisor

$$ anonfun$idle$1.applyOrElse(Endpoint.scala:375) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2018-11-18 23:24:09,616 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:46158] with UID [-1338714265] irrecoverably failed. Quarantining address. java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours) at akka.remote.ReliableDeliverySupervisor $$

anonfun$idle$1.applyOrElse(Endpoint.scala:375)

at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2018-11-18 23:24:09,665 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:44210] with UID [-1538199166] irrecoverably failed. Quarantining address.
java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)

at akka.remote.ReliableDeliverySupervisor

$$ anonfun$idle$1.applyOrElse(Endpoint.scala:375) at akka.actor.Actor$class.aroundReceive(Actor.scala:502) at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2018-11-18 23:24:09,744 ERROR akka.remote.Remoting - Association to [akka.tcp://flink@xxxx:46841] with UID [-440769522] irrecoverably failed. Quarantining address. java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours) at akka.remote.ReliableDeliverySupervisor $$

anonfun$idle$1.applyOrElse(Endpoint.scala:375)

at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.ReliableDeliverySupervisor.aroundReceive(Endpoint.scala:203)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

展开
收起
莱昂007 2018-11-19 14:53:12 21312 0
4 条回答
写回答
取消 提交回答
  • 我的日志中报错还有段不一样

    java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)
    ...
    akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [akka-actor_2.11-2.5.21.jar:2.5.21]
    2021-05-23 04:24:24,166 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@bigdata007:39947] has failed, address is now gated for [50] ms. Reason: [Disassociated] 
    ...
    2021-05-23 04:24:28,679 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerDetailsHandler [] - Unhandled exception.
    org.apache.flink.runtime.resourcemanager.exceptions.UnknownTaskExecutorException: No TaskExecutor registered under container_e48_1600092789038_34305_01_000006.
    	
    

    不知道有兄弟查出来这个问题的原因是什么没

    2021-05-24 10:03:33
    赞同 展开评论 打赏
  • 你好,我也遇到了相同的问题,请问你有找到什么原因吗?

    2019-07-17 23:15:24
    赞同 展开评论 打赏
  • 这个问题不是经常出现 任务跑几天突然出现一次,taskmanager不报错 只有jobmanager报错,但是程序正常跑,jar包冲突 您指的是什么啊?

    2019-07-17 23:15:24
    赞同 展开评论 打赏
  • 日志贴的格式太乱了,“java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)” 这行显示 Akka 链接断掉是因为 timeout,你要检查下断掉的 TaskManager 为什么会超时

    如果日志中没错误,那么有没有可能是长时间 Full GC。或者机器之前网络有什么不稳定因素,网卡打满之类的。

    从你提供的日志中无法获得更多信息。


    没有遇到过类似问题,这个和 Akka 更相关
    从日志中的错误描述来看,"java.util.concurrent.TimeoutException: Remote system has been silent for too long. (more than 48.0 hours)",Akka 链接由于 timeout 断开,得查一下为什么超时,TaskManager 没有报错这个就很奇怪了,请详细检查下 log,stdout,stderr (如果有的话)。最好也检查下网络和 GC 情况,还有 jar 冲突等问题
    另,48小时这个日志也很奇怪,timeout 有这么久?

    2019-07-17 23:15:24
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Flink CDC Meetup PPT - 龚中强 立即下载
Flink CDC Meetup PPT - 王赫 立即下载
Flink CDC Meetup PPT - 覃立辉 立即下载