Flink部署问题之带上savepoint部署任务报错如何解决

本文涉及的产品
实时计算 Flink 版,5000CU*H 3个月
简介: Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。

问题一:Flink Cli 部署问题

大家好,我在部署的时候发现了一个问题,我通过restAPI接口停掉了一个任务并保存了它的savepoint(步骤:/jobs/overview ---> /jobs/{jobid}/savepoints ---> /jobs/{jobid}/savepoints/{triggerid}),但我通过flink命令带上savepoint部署任务时会报错,但通过webui上传jar并带上savepoint就不会报错,报错堆栈如下:

2020-07-17 09:51:48,925 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Request slot with profile ResourceProfile{UNKNOWN} for job 7639673873b707aa86c4387aa7b4aac3 with allocation id e8865cdbfe4c3c33099c7112bc2e3231.

2020-07-17 09:51:48,952 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Filter (1/1) (1177659bff014e8dbc3f0508055d4307) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,952 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: Custom Source -> Filter (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,953 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source (1/1) (141f0dc22b624b39e21127f637ba63c2) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,953 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: Custom Source (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,954 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source (1/1) (274b3df03e1fab627059c1a78e4a26da) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,954 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Source: Custom Source (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,954 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,954 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Co-Process (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,955 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Co-Process -> (Sink: Unnamed, Sink: Unnamed) (1/1) (618b75fcf5ea05fb5c6487bec6426e31) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,955 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Deploying Co-Process -> (Sink: Unnamed, Sink: Unnamed) (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:49,346 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Co-Process -> (Sink: Unnamed, Sink: Unnamed) (1/1) (618b75fcf5ea05fb5c6487bec6426e31) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,370 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source (1/1) (274b3df03e1fab627059c1a78e4a26da) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,370 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source (1/1) (141f0dc22b624b39e21127f637ba63c2) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,377 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,377 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Custom Source -> Filter (1/1) (1177659bff014e8dbc3f0508055d4307) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,493 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) switched from RUNNING to FAILED.

java.lang.Exception: Exception while creating StreamOperatorStateContext.

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)

at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:255)

at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1006)

at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:454)

at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)

at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:449)

at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)

at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)

at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for LegacyKeyedCoProcessOperator_65e7116c7aa972ad18a796ae22bd6327_(1/1) from any of the 1 provided restore options.

at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:304)

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:131)

... 9 more

Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.

at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:336)

at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:548)

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:288)

at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)

at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)

... 11 more

Caused by: java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:197)

at java.io.DataInputStream.readFully(DataInputStream.java:169)

at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:85)

at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKVStateData(RocksDBFullRestoreOperation.java:221)

at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKeyGroupsInStateHandle(RocksDBFullRestoreOperation.java:168)

at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restore(RocksDBFullRestoreOperation.java:151)

at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:279)

... 15 more

*来自志愿者整理的flink邮件归档



参考答案:

请问你使用哪个版本的 Flink 呢?能否分享一下 Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) 这个 tm 的 log 呢?从上面给的日志看,应该是在 083f69d029de 这台机器上。*来自志愿者整理的flink邮件归档



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/370235?spm=a2c6h.12873639.article-detail.58.6f9243783Lv0fl



问题二:flink1.11 run

hi,我这面请一个一个kafka到hive的程序,但程序无法运行,请问什么原因:

异常: The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: No operators defined in streaming topology. Cannot generate StreamGraph. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:149) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:699) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:232) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) Caused by: java.lang.IllegalStateException: No operators defined in streaming topology. Cannot generate StreamGraph. at org.apache.flink.table.planner.utils.ExecutorUtils.generateStreamGraph(ExecutorUtils.java:47) at org.apache.flink.table.planner.delegation.StreamExecutor.createPipeline(StreamExecutor.java:47) at org.apache.flink.table.api.internal.TableEnvironmentImpl.execute(TableEnvironmentImpl.java:1197) at com.akulaku.data.flink.StreamingWriteToHive.main(StreamingWriteToHive.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288) ... 11 more 代码:

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().useBlinkPlanner().build(); StreamTableEnvironment tableEnv = StreamTableEnvironment.create(environment, settings);

environment.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); environment.setStateBackend(new MemoryStateBackend()); environment.getCheckpointConfig().setCheckpointInterval(5000);

String name = "myhive"; String defaultDatabase = "tmp"; String hiveConfDir = "/etc/alternatives/hive-conf/"; String version = "1.1.0";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version); tableEnv.registerCatalog("myhive", hive); tableEnv.useCatalog("myhive");

tableEnv.executeSql("CREATE TABLE tmp.user_behavior (\n" + " user_id BIGINT,\n" + " item_id STRING,\n" + " behavior STRING,\n" + " ts AS PROCTIME()\n" + ") WITH (\n" + " 'connector' = 'kafka-0.11',\n" + " 'topic' = 'user_behavior',\n" + " 'properties.bootstrap.servers' = 'localhost:9092',\n" + " 'properties.group.id' = 'testGroup',\n" + " 'scan.startup.mode' = 'earliest-offset',\n" + " 'format' = 'json',\n" + " 'json.fail-on-missing-field' = 'false',\n" + " 'json.ignore-parse-errors' = 'true'\n" + ")");

// tableEnv.executeSql("CREATE TABLE print_table (\n" + // " user_id BIGINT,\n" + // " item_id STRING,\n" + // " behavior STRING,\n" + // " tsdata STRING\n" + // ") WITH (\n" + // " 'connector' = 'print'\n" + // ")"); tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE); tableEnv.executeSql("CREATE TABLE tmp.streamhivetest (\n" + " user_id BIGINT,\n" + " item_id STRING,\n" + " behavior STRING,\n" + " tsdata STRING\n" + ") STORED AS parquet TBLPROPERTIES (\n" + " 'sink.rolling-policy.file-size' = '12MB',\n" + " 'sink.rolling-policy.rollover-interval' = '1 min',\n" + " 'sink.rolling-policy.check-interval' = '1 min',\n" + " 'execution.checkpointing.interval' = 'true'\n" + ")");

tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT); tableEnv.executeSql("insert into streamhivetest select user_id,item_id,behavior,DATE_FORMAT(ts, 'yyyy-MM-dd') as tsdata from user_behavior");

tableEnv.execute("stream-write-hive");

*来自志愿者整理的flink邮件归档



参考答案:

tableEnv.executeSql就已经提交作业了,不需要再执行execute了哈*来自志愿者整理的flink邮件归档



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/370234?spm=a2c6h.12873639.article-detail.59.6f9243783Lv0fl



问题三:Re: pyflink1.11.0window

你的source ddl里有指定time1为 time attribute吗? create table source1( id int, time1 timestamp, type string, WATERMARK FOR time1 as time1 - INTERVAL '2' SECOND ) with (...)

*来自志愿者整理的flink邮件归档



参考答案:

org.apache.flink.table.api.ValidationException: A tumble window
expects a size value literal.
看起来是接下tumble window定义的代码不太正确吧

*来自志愿者整理的flink邮件归档



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/370233?spm=a2c6h.12873639.article-detail.60.6f9243783Lv0fl



问题四:Flink 1.11.2 读写Hive以及对hive的版本支持

我这面在flink中注册hivecatalog,想将kafka数据流式写入到hive表中,但是现在建立kafka表的时候默认会保存元数据到hive表,请问有办法不保存这个kafka元数据表吗?如果不注册hivecatalog的话没办法写数据到hive吧。。。。

*来自志愿者整理的flink邮件归档



参考答案:

CREATE TEMPORARY TABLE kafka_table...

好像没文档,我建个JIRA跟踪下

https://issues.apache.org/jira/browse/FLINK-18624*来自志愿者整理的flink邮件归档



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/370232?spm=a2c6h.12873639.article-detail.61.6f9243783Lv0fl



问题五:Flink on k8s 中,Jar 任务 avatica-core 依赖和 flink-table

我现在正在迁移任务到 k8s ,目前版本为 Flink 1.6 版本,k8s 上面作业运行模式为 standalone per job.

现在遇到一个问题,业务方 Flink jar 任务使用了 org.apache.calcite.avatica 依赖,也就是下面依赖:

org.apache.calcite.avatica

avatica-core

${avatica.version}

但是这个依赖其实在 flink-table 模块中,也有这个依赖:

[image: image.png]

由于 flink on k8s standalone per job 模式,会把 Flink 任务 jar 包放入到 flink 本身的lib

包中,我在任务启动的时候,就会报:

Caused by: java.lang.NoClassDefFoundError: Could not initialize class

org.apache.calcite.avatica.ConnectionPropertiesImpl 错误。

按照我的理解,由于 Flink jar 任务包中有 avatica-core 依赖,同时在 flink lib

目录下面,flink-table_2.11-1.6-RELEASE.jar 中也有这个依赖,这两个都在 lib 目录下,然后就出现了类冲突问题。

请问怎么解决这个问题呢,非常期待你的回复。

*来自志愿者整理的flink邮件归档



参考答案:

如果单纯想解决 jar 包冲突的问题,那么 maven shade plugin[1] 或许对你有用

[1]

https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

Best,*来自志愿者整理的flink邮件归档



关于本问题的更多回答可点击进行查看:

https://developer.aliyun.com/ask/370231?spm=a2c6h.12873639.article-detail.62.6f9243783Lv0fl

相关实践学习
基于Hologres轻松玩转一站式实时仓库
本场景介绍如何利用阿里云MaxCompute、实时计算Flink和交互式分析服务Hologres开发离线、实时数据融合分析的数据大屏应用。
Linux入门到精通
本套课程是从入门开始的Linux学习课程,适合初学者阅读。由浅入深案例丰富,通俗易懂。主要涉及基础的系统操作以及工作中常用的各种服务软件的应用、部署和优化。即使是零基础的学员,只要能够坚持把所有章节都学完,也一定会受益匪浅。
相关文章
|
4月前
|
容灾 流计算
美团 Flink 大作业部署问题之 Checkpoint 跨机房副本的制作能力如何实现
美团 Flink 大作业部署问题之 Checkpoint 跨机房副本的制作能力如何实现
|
2月前
|
Java Shell Maven
Flink-11 Flink Java 3分钟上手 打包Flink 提交任务至服务器执行 JobSubmit Maven打包Ja配置 maven-shade-plugin
Flink-11 Flink Java 3分钟上手 打包Flink 提交任务至服务器执行 JobSubmit Maven打包Ja配置 maven-shade-plugin
133 4
|
2月前
|
Kubernetes Cloud Native 流计算
Flink-12 Flink Java 3分钟上手 Kubernetes云原生下的Flink集群 Rancher Stateful Set yaml详细 扩容缩容部署 Docker容器编排
Flink-12 Flink Java 3分钟上手 Kubernetes云原生下的Flink集群 Rancher Stateful Set yaml详细 扩容缩容部署 Docker容器编排
93 3
|
2月前
|
资源调度 分布式计算 大数据
大数据-111 Flink 安装部署 YARN部署模式 FlinkYARN模式申请资源、提交任务
大数据-111 Flink 安装部署 YARN部署模式 FlinkYARN模式申请资源、提交任务
124 0
|
4月前
|
资源调度 Java Scala
实时计算 Flink版产品使用问题之如何实现ZooKeeper抖动导致任务失败时,能从最近的检查点重新启动任务
实时计算Flink版作为一种强大的流处理和批处理统一的计算框架,广泛应用于各种需要实时数据处理和分析的场景。实时计算Flink版通常结合SQL接口、DataStream API、以及与上下游数据源和存储系统的丰富连接器,提供了一套全面的解决方案,以应对各种实时计算需求。其低延迟、高吞吐、容错性强的特点,使其成为众多企业和组织实时数据处理首选的技术平台。以下是实时计算Flink版的一些典型使用合集。
|
4月前
|
Oracle 关系型数据库 MySQL
实时计算 Flink版产品使用问题之如何从savepoint重新启动作业
实时计算Flink版作为一种强大的流处理和批处理统一的计算框架,广泛应用于各种需要实时数据处理和分析的场景。实时计算Flink版通常结合SQL接口、DataStream API、以及与上下游数据源和存储系统的丰富连接器,提供了一套全面的解决方案,以应对各种实时计算需求。其低延迟、高吞吐、容错性强的特点,使其成为众多企业和组织实时数据处理首选的技术平台。以下是实时计算Flink版的一些典型使用合集。
|
4月前
|
机器学习/深度学习 人工智能 运维
美团 Flink 大作业部署问题之Flink在生态技术演进上有什么主要方向
美团 Flink 大作业部署问题之Flink在生态技术演进上有什么主要方向
|
4月前
|
监控 Serverless Apache
美团 Flink 大作业部署问题之如何体现Flink在业界的影响力
美团 Flink 大作业部署问题之如何体现Flink在业界的影响力
|
4月前
|
监控 Serverless 数据库
美团 Flink 大作业部署问题之端云联调并将流量恢复到云端实例如何结束
美团 Flink 大作业部署问题之端云联调并将流量恢复到云端实例如何结束
|
4月前
|
监控 Java Serverless
美团 Flink 大作业部署问题之想在Serverless平台上实时查看Spring Boot应用的日志要怎么操作
美团 Flink 大作业部署问题之想在Serverless平台上实时查看Spring Boot应用的日志要怎么操作

相关产品

  • 实时计算 Flink版