Flink部署问题之带上savepoint部署任务报错如何解决-阿里云开发者社区

问题一：Flink Cli 部署问题

大家好，我在部署的时候发现了一个问题，我通过restAPI接口停掉了一个任务并保存了它的savepoint(步骤：/jobs/overview ---> /jobs/{jobid}/savepoints ---> /jobs/{jobid}/savepoints/{triggerid})，但我通过flink命令带上savepoint部署任务时会报错，但通过webui上传jar并带上savepoint就不会报错，报错堆栈如下：

2020-07-17 09:51:48,925 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Request slot with profile ResourceProfile{UNKNOWN} for job 7639673873b707aa86c4387aa7b4aac3 with allocation id e8865cdbfe4c3c33099c7112bc2e3231.

2020-07-17 09:51:48,952 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Filter (1/1) (1177659bff014e8dbc3f0508055d4307) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,952 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Custom Source -> Filter (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,953 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (141f0dc22b624b39e21127f637ba63c2) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,953 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Custom Source (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,954 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (274b3df03e1fab627059c1a78e4a26da) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,954 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Source: Custom Source (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,954 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,954 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Co-Process (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:48,955 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Co-Process -> (Sink: Unnamed, Sink: Unnamed) (1/1) (618b75fcf5ea05fb5c6487bec6426e31) switched from SCHEDULED to DEPLOYING.

2020-07-17 09:51:48,955 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Deploying Co-Process -> (Sink: Unnamed, Sink: Unnamed) (1/1) (attempt #0) to e63d829deafc144cd82efd73979dd056 @ 083f69d029de (dataPort=35758)

2020-07-17 09:51:49,346 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Co-Process -> (Sink: Unnamed, Sink: Unnamed) (1/1) (618b75fcf5ea05fb5c6487bec6426e31) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,370 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (274b3df03e1fab627059c1a78e4a26da) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,370 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source (1/1) (141f0dc22b624b39e21127f637ba63c2) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,377 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,377 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Filter (1/1) (1177659bff014e8dbc3f0508055d4307) switched from DEPLOYING to RUNNING.

2020-07-17 09:51:49,493 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) switched from RUNNING to FAILED.

java.lang.Exception: Exception while creating StreamOperatorStateContext.

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)

at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:255)

at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1006)

at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:454)

at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)

at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:449)

at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)

at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)

at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for LegacyKeyedCoProcessOperator_65e7116c7aa972ad18a796ae22bd6327_(1/1) from any of the 1 provided restore options.

at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:304)

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:131)

... 9 more

Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected exception.

at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:336)

at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:548)

at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:288)

at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)

at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)

... 11 more

Caused by: java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:197)

at java.io.DataInputStream.readFully(DataInputStream.java:169)

at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:85)

at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKVStateData(RocksDBFullRestoreOperation.java:221)

at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKeyGroupsInStateHandle(RocksDBFullRestoreOperation.java:168)

at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restore(RocksDBFullRestoreOperation.java:151)

at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:279)

... 15 more

*来自志愿者整理的flink邮件归档

参考答案：

请问你使用哪个版本的 Flink 呢？能否分享一下 Co-Process (1/1) (d0309f26a545e74643382ed3f758269b) 这个 tm 的 log 呢？从上面给的日志看，应该是在 083f69d029de 这台机器上。*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370235?spm=a2c6h.12873639.article-detail.58.6f9243783Lv0fl

问题二：flink1.11 run

hi，我这面请一个一个kafka到hive的程序，但程序无法运行，请问什么原因：

异常： The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: No operators defined in streaming topology. Cannot generate StreamGraph. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:149) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:699) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:232) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:916) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992) Caused by: java.lang.IllegalStateException: No operators defined in streaming topology. Cannot generate StreamGraph. at org.apache.flink.table.planner.utils.ExecutorUtils.generateStreamGraph(ExecutorUtils.java:47) at org.apache.flink.table.planner.delegation.StreamExecutor.createPipeline(StreamExecutor.java:47) at org.apache.flink.table.api.internal.TableEnvironmentImpl.execute(TableEnvironmentImpl.java:1197) at com.akulaku.data.flink.StreamingWriteToHive.main(StreamingWriteToHive.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288) ... 11 more 代码：

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().useBlinkPlanner().build(); StreamTableEnvironment tableEnv = StreamTableEnvironment.create(environment, settings);

environment.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE); environment.setStateBackend(new MemoryStateBackend()); environment.getCheckpointConfig().setCheckpointInterval(5000);

String name = "myhive"; String defaultDatabase = "tmp"; String hiveConfDir = "/etc/alternatives/hive-conf/"; String version = "1.1.0";

HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir, version); tableEnv.registerCatalog("myhive", hive); tableEnv.useCatalog("myhive");

tableEnv.executeSql("CREATE TABLE tmp.user_behavior (\n" + " user_id BIGINT,\n" + " item_id STRING,\n" + " behavior STRING,\n" + " ts AS PROCTIME()\n" + ") WITH (\n" + " 'connector' = 'kafka-0.11',\n" + " 'topic' = 'user_behavior',\n" + " 'properties.bootstrap.servers' = 'localhost:9092',\n" + " 'properties.group.id' = 'testGroup',\n" + " 'scan.startup.mode' = 'earliest-offset',\n" + " 'format' = 'json',\n" + " 'json.fail-on-missing-field' = 'false',\n" + " 'json.ignore-parse-errors' = 'true'\n" + ")");

// tableEnv.executeSql("CREATE TABLE print_table (\n" + // " user_id BIGINT,\n" + // " item_id STRING,\n" + // " behavior STRING,\n" + // " tsdata STRING\n" + // ") WITH (\n" + // " 'connector' = 'print'\n" + // ")"); tableEnv.getConfig().setSqlDialect(SqlDialect.HIVE); tableEnv.executeSql("CREATE TABLE tmp.streamhivetest (\n" + " user_id BIGINT,\n" + " item_id STRING,\n" + " behavior STRING,\n" + " tsdata STRING\n" + ") STORED AS parquet TBLPROPERTIES (\n" + " 'sink.rolling-policy.file-size' = '12MB',\n" + " 'sink.rolling-policy.rollover-interval' = '1 min',\n" + " 'sink.rolling-policy.check-interval' = '1 min',\n" + " 'execution.checkpointing.interval' = 'true'\n" + ")");

tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT); tableEnv.executeSql("insert into streamhivetest select user_id,item_id,behavior,DATE_FORMAT(ts, 'yyyy-MM-dd') as tsdata from user_behavior");

tableEnv.execute("stream-write-hive");

*来自志愿者整理的flink邮件归档

参考答案：

tableEnv.executeSql就已经提交作业了，不需要再执行execute了哈*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370234?spm=a2c6h.12873639.article-detail.59.6f9243783Lv0fl

问题三：Re: pyflink1.11.0window

你的source ddl里有指定time1为 time attribute吗？ create table source1( id int, time1 timestamp, type string, WATERMARK FOR time1 as time1 - INTERVAL '2' SECOND ) with (...)

*来自志愿者整理的flink邮件归档

参考答案：

org.apache.flink.table.api.ValidationException: A tumble window
expects a size value literal.
看起来是接下tumble window定义的代码不太正确吧

*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370233?spm=a2c6h.12873639.article-detail.60.6f9243783Lv0fl

问题四：Flink 1.11.2 读写Hive以及对hive的版本支持

我这面在flink中注册hivecatalog，想将kafka数据流式写入到hive表中，但是现在建立kafka表的时候默认会保存元数据到hive表，请问有办法不保存这个kafka元数据表吗？如果不注册hivecatalog的话没办法写数据到hive吧。。。。

*来自志愿者整理的flink邮件归档

参考答案：

CREATE TEMPORARY TABLE kafka_table...

好像没文档，我建个JIRA跟踪下

https://issues.apache.org/jira/browse/FLINK-18624*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370232?spm=a2c6h.12873639.article-detail.61.6f9243783Lv0fl

问题五：Flink on k8s 中，Jar 任务 avatica-core 依赖和 flink-table

我现在正在迁移任务到 k8s ,目前版本为 Flink 1.6 版本，k8s 上面作业运行模式为 standalone per job.

现在遇到一个问题，业务方 Flink jar 任务使用了 org.apache.calcite.avatica 依赖，也就是下面依赖：

org.apache.calcite.avatica

avatica-core

${avatica.version}

但是这个依赖其实在 flink-table 模块中，也有这个依赖：

[image: image.png]

由于 flink on k8s standalone per job 模式，会把 Flink 任务 jar 包放入到 flink 本身的lib

包中，我在任务启动的时候，就会报：

Caused by: java.lang.NoClassDefFoundError: Could not initialize class

org.apache.calcite.avatica.ConnectionPropertiesImpl 错误。

按照我的理解，由于 Flink jar 任务包中有 avatica-core 依赖，同时在 flink lib

目录下面，flink-table_2.11-1.6-RELEASE.jar 中也有这个依赖，这两个都在 lib 目录下，然后就出现了类冲突问题。

请问怎么解决这个问题呢，非常期待你的回复。

*来自志愿者整理的flink邮件归档

参考答案：

如果单纯想解决 jar 包冲突的问题，那么 maven shade plugin[1] 或许对你有用

[1]

https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

Best,*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370231?spm=a2c6h.12873639.article-detail.62.6f9243783Lv0fl

Flink部署问题之带上savepoint部署任务报错如何解决

问题一：Flink Cli 部署问题

问题二：flink1.11 run

问题三：Re: pyflink1.11.0window

问题四：Flink 1.11.2 读写Hive以及对hive的版本支持

问题五：Flink on k8s 中，Jar 任务 avatica-core 依赖和 flink-table

实时计算 Flink

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景