背景:
官方给出触发保存点
https://nightlies.apache.org/flink/flink-docs-release-1.12/ops/state/savepoints.html
触发Savepoint with YARN
提交作业:
flink run -m yarn-cluster -yd -yjm 1024m -ytm 1024m -ynm flink-order-realtime -ys 2 /opt/flink_app/flink-order-realtime-1.0-SNAPSHOT.jar
触发Savepoint:
flink savepoint 8fffb22c9de48c698de385698acbbc5d hdfs://hadoop202:8020/flink/savepoints -yid application_1646373783800_0022
触发Savepoint失败:
尝试解决:无效
修改flink-conf.yamls
akka.client.timeout: 300000
最终解决:有效
flink savepoint 8fffb22c9de48c698de385698acbbc5d hdfs://hadoop202:8020/flink/savepoints
Triggering savepoint for job 8fffb22c9de48c698de385698acbbc5d.
Waiting for response...
Savepoint completed. Path: hdfs://hadoop202:8020/flink/savepoints/savepoint-8fffb2-fda527b1531b
You can resume your program from this savepoint with the run command.
原因分析:
-m yarn-cluster或-t yarn-per-job 本质都是yarn-session
yarn-session多作业共一个集群, 作业与-yid是多对一, 触发Savepoint, 猜想不应指定-yid, 结果的确正常触发
猜想yarn-cluster也不指定-yid, 结果果然正常触发
补充分析:
为何说-m yarn-cluster或-t yarn-per-job 本质都是yarn-session?
flink run -t yarn-per-job 提交作业, 查看日志
flink run -m yarn-cluster -yd提交作业, 查看日志
均是Flink YARN session cluster, 可理解为特殊的只跑一个作业的yarn-session
总结:
有在网上看到flink savepoint指定-yid触发成功的图片,日志显示时间是2020-10月,猜想是版本和官方文档未修改导致的问题
查阅官网,发现Flink 1.12.0发行时间是2020-12月,且官方文档自Flink 1.5.0(2018-05月)触发Savepoint with Yarn部分便指定-yid
应该是早期版本可以指定-yid触发,后期版本(至少Flink 1.12)不可以,而官网文档未及时更新这个细节问题