我触发Flink 保存点总是失败,报错如下,一直说是超时,但是没有进一步的信息可以查看,我查资料说可以设置checkpoint超时时间,我设置了2min,但是触发
保存点时在2min之前就会报错,另外我的 状态 并不大
The program finished with the following exception:
org.apache.flink.util.FlinkException: Triggering a savepoint for the job 00000000000000000000000000000000 failed.
at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.TimeoutException
at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)*来自志愿者整理的flink邮件归档
这个报错是 client 提起触发 checkpoint 的请求后,job manager 没有及时反馈 checkpoint 的结果。没有及时反馈的原因可能有很多,比如 checkpoint 超时,比如网络通信问题等等。可以打开 flink web ui 看一下是否有更多信息,或者打开 job manager 和 task manager 的 log 看一下。*来自志愿者整理的FLINK邮件归档
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。