Flink依赖问题之connector hive依赖冲突如何解决-阿里云开发者社区

问题一：Apache Flink常见问题汇总【精品问答】

我看1.11的java.sql.Timestamp 对应的是Flink的TIMESTAMP(9)，跟之前默认的TIMESTAMP(3)有区别，而且之前1.10的Timestamp(3)是带时区UTC的，现在这个类型不带时区了。想问下这个具体调整应该如何适配？

*来自志愿者整理的flink邮件归档

参考答案：我通过flink sql 定义了一个es sink，其中有个字段类型定义为了 eventTime TIMESTAMP(9) WITH LOCAL TIME ZONE。在尝试写入时，报了如下的异常。看来json parser无法解析这种类型。请问下大神们，我应该怎么写入一个UTC日期的时间类型？格式类似 2020-07-15T12:00:00.000Z

java.lang.UnsupportedOperationException: Not support to parse type: TIMESTAMP(9) WITH LOCAL TIME ZONE

at org.apache.flink.formats.json.JsonRowDataSerializationSchema.createNotNullConverter(JsonRowDataSerializationSchema.java:184)

*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370296?spm=a2c6h.12873639.article-detail.53.6f9243783Lv0fl

问题二：connector hive依赖冲突

大佬们，下面连接hive的依赖包的哪个传递依赖导致的jar包冲突，我从1.9到1.11每次在maven按照官方文档引包都会出现依赖冲突。。。。1.9刚发布的时候对下面的引包有做依赖排除，后来文档改了

// Flink's Hive connector.Contains flink-hadoop-compatibility and flink-orc jars

flink-connector-hive_2.11-1.11.0.jar

// Hive dependencies

hive-exec-2.3.4.jar

*来自志愿者整理的flink邮件归档

参考答案：

1.9和1.10时候排除一些传递依赖后在idea和打uber jar在集群环境都可以运行，不排除传递依赖的话在idea运行不了；

1.11现在只在本地测哪，不排除传递依赖idea运行不了，集群环境还没弄，但是我感觉在idea直接run这个功能好多人都需要，文档是不是可以改进一下*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370251?spm=a2c6h.12873639.article-detail.54.6f9243783Lv0fl

问题三：flink1.11 pyflink stream job 退出

python flink_cep_example.py 过几秒就退出了，应该一直运行不退出的啊。代码如下，使用了MATCH_RECOGNIZE：

s_env = StreamExecutionEnvironment.get_execution_environment() b_s_settings = EnvironmentSettings.new_instance().use_blink_planner().in_streaming_mode().build() st_env = StreamTableEnvironment.create(s_env, environment_settings=b_s_settings) configuration = st_env.get_config().get_configuration() configuration.set_string("taskmanager.memory.task.off-heap.size", "500m")

s_env.set_parallelism(1)

kafka_source = """CREATE TABLE source ( flow_name STRING, flow_id STRING, component STRING, filename STRING, event_time TIMESTAMP(3), WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND ) WITH ( 'connector' = 'kafka', 'topic' = 'cep', 'properties.bootstrap.servers' = 'localhost:9092', 'format' = 'json', 'scan.startup.mode' = 'latest-offset' )"""

postgres_sink = """ CREATE TABLE cep_result ( filename STRING, start_tstamp TIMESTAMP(3), end_tstamp TIMESTAMP(3) ) WITH ( 'connector.type' = 'jdbc', 'connector.url' = 'jdbc:postgresql://127.0.0.1:5432/postgres', 'connector.table' = 'cep_result', 'connector.driver' = 'org.postgresql.Driver', 'connector.username' = 'postgres', 'connector.password' = 'my_password', 'connector.write.flush.max-rows' = '1' ) """

st_env.sql_update(kafka_source) st_env.sql_update(postgres_sink)

postgres_sink_sql = ''' INSERT INTO cep_result SELECT * FROM source MATCH_RECOGNIZE ( PARTITION BY filename ORDER BY event_time MEASURES (A.event_time) AS start_tstamp, (D.event_time) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B C D) DEFINE A AS component = 'XXX', B AS component = 'YYY', C AS component = 'ZZZ', D AS component = 'WWW' ) MR '''

sql_result = st_env.execute_sql(postgres_sink_sql)

*来自志愿者整理的flink邮件归档

参考答案：

execute_sql是一个异步非阻塞的方法，所以你需要在你的代码末尾加上 sql_result.get_job_client().get_job_execution_result().result() 对此我已经创建了JIRA[1]

[1] https://issues.apache.org/jira/browse/FLINK-18598*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370249?spm=a2c6h.12873639.article-detail.55.6f9243783Lv0fl

问题四：官方pyflink 例子的执行问题

官方例子: https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html 按照例子写了程序,也安装了pyflink | python -m pip install apache-flink | 代码: | from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment, DataTypes from pyflink.table.descriptors import Schema, OldCsv, FileSystem from pyflink.table.udf import udf

env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) t_env = StreamTableEnvironment.create(env)

add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT())

t_env.register_function("add", add)

t_env.connect(FileSystem().path('C:/Users/xuyin/Desktop/docker_compose_test/src.txt'))

.with_format(OldCsv() .field('a', DataTypes.BIGINT()) .field('b', DataTypes.BIGINT()))

.with_schema(Schema() .field('a', DataTypes.BIGINT()) .field('b', DataTypes.BIGINT()))

.create_temporary_table('mySource')

t_env.connect(FileSystem().path('C:/Users/xuyin/Desktop/docker_compose_test/tar.txt'))

.with_format(OldCsv() .field('sum', DataTypes.BIGINT()))

.with_schema(Schema() .field('sum', DataTypes.BIGINT()))

.create_temporary_table('mySink')

t_env.from_path('mySource')

.select("add(a, b)")

.insert_into('mySink')

t_env.execute("tutorial_job") |

执行:

| python test_pyflink.py |

报错:

| Traceback (most recent call last): File "C:\Users\xuyin\AppData\Local\Programs\Python\Python37\lib\site-packages\pyflink\util\exceptions.py", line 147, in deco return f(*a, **kw) File "C:\Users\xuyin\AppData\Local\Programs\Python\Python37\lib\site-packages\py4j\protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o2.execute. : org.apache.flink.table.api.TableException: The configured Task Off-Heap Memory 0 bytes is less than the least required Python worker Memory 79 mb. The Task Off-Heap Memory can be configured using the configuration key 'taskmanager.memory.task.off-heap.size'. at org.apache.flink.table.planner.plan.nodes.common.CommonPythonBase$class.checkPythonWorkerMemory(CommonPythonBase.scala:158) at org.apache.flink.table.planner.plan.nodes.common.CommonPythonBase$class.getMergedConfiguration(CommonPythonBase.scala:119) at org.apache.flink.table.planner.plan.nodes.common.CommonPythonBase$class.getConfig(CommonPythonBase.scala:102) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecPythonCalc.getConfig(StreamExecPythonCalc.scala:35) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecPythonCalc.translateToPlanInternal(StreamExecPythonCalc.scala:61) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecPythonCalc.translateToPlanInternal(StreamExecPythonCalc.scala:35) at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecCalcBase.translateToPlan(StreamExecCalcBase.scala:38) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToTransformation(StreamExecLegacySink.scala:158) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:106) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:48) at org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) at org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlan(StreamExecLegacySink.scala:48) at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:67) at org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.delegation.StreamPlanner.translateToPlan(StreamPlanner.scala:66) at org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:166) at org.apache.flink.table.api.internal.TableEnvironmentImpl.translate(TableEnvironmentImpl.java:1248) at org.apache.flink.table.api.internal.TableEnvironmentImpl.translateAndClearBuffer(TableEnvironmentImpl.java:1240) at org.apache.flink.table.api.internal.TableEnvironmentImpl.execute(TableEnvironmentImpl.java:1197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:745)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "test_pyflink.py", line 34, in t_env.execute("tutorial_job") File "C:\Users\xuyin\AppData\Local\Programs\Python\Python37\lib\site-packages\pyflink\table\table_environment.py", line 1057, in execute return JobExecutionResult(self._j_tenv.execute(job_name)) File "C:\Users\xuyin\AppData\Local\Programs\Python\Python37\lib\site-packages\py4j\java_gateway.py", line 1286, in call answer, self.gateway_client, self.target_id, self.name) File "C:\Users\xuyin\AppData\Local\Programs\Python\Python37\lib\site-packages\pyflink\util\exceptions.py", line 154, in deco raise exception_mapping[exception](s.split(': ', 1)[1], stack_trace) pyflink.util.exceptions.TableException: "The configured Task Off-Heap Memory 0 bytes is less than the least required Python worker Memory 79 mb. The Task Off-Heap Memory can be configured using the configuration key 'taskmanager.memory.task.off-heap.size'." |

里面提到还要配置taskmanager.memory.task.off-heap.size这个属性吗 ,

我找到..\Python\Python37\Lib\site-packages\pyflink\conf下面的flink-conf.yaml

增加了taskmanager.memory.task.off-heap.size: 100m

但是还是报一样的错误

请问用python安装的flink去哪里配置属性

*来自志愿者整理的flink邮件归档

参考答案：

你需要添加配置，如果你没有使用RocksDB作为statebackend的话，你直接配置t_env.get_config().get_configuration().set_boolean("python.fn-execution.memory.managed", True)就行，如果你用了的话，就需要配置off-heap memory了，table_env.get_config().get_configuration().set_string("taskmanager.memory.task.off-heap.size", '80m')。你可以参考文档上的例子，以及对应的note说明[1]

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/python_udfs.html#scalar-functions*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370237?spm=a2c6h.12873639.article-detail.56.6f9243783Lv0fl

问题五：Flink整合hive之后，通过flink创建的表，hive beeline可见表，不可见字段？

参照文档https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/#connecting-to-hive 通过flink创建表：CREATE TABLE Orders (product STRING, amount INT) 在beeline端可见表，但是desc看不到字段，select * from orders也不可用

*来自志愿者整理的flink邮件归档

参考答案：

默认创建的是Flink表，Hive端不可见。

你想创建Hive表的话，用Hive dialect。*来自志愿者整理的flink邮件归档

关于本问题的更多回答可点击进行查看：

https://developer.aliyun.com/ask/370236?spm=a2c6h.12873639.article-detail.57.6f9243783Lv0fl

Flink依赖问题之connector hive依赖冲突如何解决

问题一：Apache Flink常见问题汇总【精品问答】

问题二：connector hive依赖冲突

问题三：flink1.11 pyflink stream job 退出

问题四：官方pyflink 例子的执行问题

问题五：Flink整合hive之后，通过flink创建的表，hive beeline可见表，不可见字段？

实时计算 Flink

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景