问题一:JdbcUpsertTableSink输出的时候报‘Duplicate entry’问题
如题,按照官方文档,当mysql表定义了primary key的时候,会使用UpsertTableSink,并且会使用insert on duplicate方式写入。
但我在使用中,发现报了 duplicate entry的错误。例如: Caused by: com.mysql.jdbc.exceptions.jdbc4. MySQLIntegrityConstraintViolationException: Duplicate entry '2036-feed_landing_box_news-2000-202011231405' for key 'uniq_ssmt' at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance( NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance( DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1041) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4190) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4122) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2818) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement .java:2157) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement .java:2460) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement .java:2377) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement .java:2361) at com.mysql.jdbc.PreparedStatement.executeBatchedInserts( PreparedStatement.java:1793)
(2) 此外,还有个小奇怪点,202011231405的数据,其他该时间的数据都在06分输出了(我设置了1min的maxOutOfOrder)。 但这个冲突的entry是在14.11分那一波才报错的。*来自志愿者整理的flink邮件归档
参考答案:
数据库中主键的设置跟 primary key 定义的一样不?
*来自志愿者整理的flink邮件归档
关于本问题的更多回答可点击进行查看:
https://developer.aliyun.com/ask/364604?spm=a2c6h.13066369.question.98.33bf585fYCIlUK
问题二:pyflink 1.11.1 调用包含第三方依赖库的udf时报错
pyflink 1.11.1 调用包含第三方依赖库的udf时报错 :
运行环境: windows10 python==3.7.9 apache-flink==1.11.1 apache-beam==2.19.0
udf 依赖第三方库: h3==3.7.0
pytest 通过。
运行时报错,报错信息如下
2020-11-23 14:20:51,656 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: TableSourceScan(table=[[default_catalog, default_database, order_info]], fields=[source, order_id, user_id, car_id, driver_name, driver_id, time_dispatch_done, time_start, time_cancel, status, start_city, end_city, start_ad_code, end_ad_code, cancel_reason_id, realStartLatitude, realStartLongitude]) -> StreamExecPythonCalc -> Sink: Sink(table=[default_catalog.default_database.print_table], fields=[hash_h3]) (3/8) (056a0a0cdf3838794f4023e61d04a690) switched from RUNNING to FAILED. java.lang.RuntimeException: Failed to create stage bundle factory! at org.apache.flink.python.AbstractPythonFunctionRunner.createStageBundleFactory(AbstractPythonFunctionRunner.java:197) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.python.AbstractPythonFunctionRunner.open(AbstractPythonFunctionRunner.java:164) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.table.runtime.runners.python.scalar.AbstractGeneralPythonScalarFunctionRunner.open(AbstractGeneralPythonScalarFunctionRunner.java:65) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator$ProjectUdfInputPythonScalarFunctionRunner.open(AbstractStatelessFunctionOperator.java:186) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.streaming.api.operators.python.AbstractPythonFunctionOperator.open(AbstractPythonFunctionOperator.java:143) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:131) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.table.runtime.operators.python.scalar.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:88) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.table.runtime.operators.python.scalar.AbstractRowDataPythonScalarFunctionOperator.open(AbstractRowDataPythonScalarFunctionOperator.java:80) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.table.runtime.operators.python.scalar.RowDataPythonScalarFunctionOperator.open(RowDataPythonScalarFunctionOperator.java:64) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:291) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:473) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:469) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:522) ~[flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) [flink-dist_2.11-1.11.1.jar:1.11.1] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) [flink-dist_2.11-1.11.1.jar:1.11.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221] Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: Process died with exit code 0 at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory. (DefaultJobBundleFactory.java:331) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory. (DefaultJobBundleFactory.java:320) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:250) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.python.AbstractPythonFunctionRunner.createStageBundleFactory(AbstractPythonFunctionRunner.java:195) ~[flink-python_2.11-1.11.1.jar:1.11.1] ... 16 more Caused by: java.lang.IllegalStateException: Process died with exit code 0 at org.apache.beam.runners.fnexecution.environment.ProcessManager$RunningProcess.isAliveOrThrow(ProcessManager.java:72) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:137) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:200) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:184) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory. (DefaultJobBundleFactory.java:331) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory. (DefaultJobBundleFactory.java:320) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:250) ~[flink-python_2.11-1.11.1.jar:1.11.1] at org.apache.flink.python.AbstractPythonFunctionRunner.createStageBundleFactory(AbstractPythonFunctionRunner.java:195) ~[flink-python_2.11-1.11.1.jar:1.11.1] ... 16 more *来自志愿者整理的flink邮件归档
参考答案:
你可以帖下taskmanager的日志吗,这个日志只能看到启动Python进程的时候挂掉了,其他信息看不到。 *来自志愿者整理的flink邮件归档
关于本问题的更多回答可点击进行查看:
https://developer.aliyun.com/ask/364603?spm=a2c6h.13066369.question.99.33bf585fVdiL84
问题三:FlinkSQL CDC 窗口分组聚合求助
我使用CDC组件debezium将MySQL中的维表数据实时发送到kafka,然后使用flinkSQL消费kafka的数据,具体流程如下: [image: image.png] [image: image.png] 分组计算的SQL如下: [image: image.png] 在执行计算时,报了如下异常: Exception in thread "main" org.apache.flink.table.api.TableException: GroupWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan(table=[[default_catalog, default_database, t_order, watermark=[-(TO_TIMESTAMP(FROM_UNIXTIME(/($1, 1000))), 3000:INTERVAL SECOND)]]], fields=[id, timestamps, orderInformationId, userId, categoryId, productId, price, productCount, priceSum, shipAddress, receiverAddress]) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.createNewNode(FlinkChangelogModeInferenceProgram.scala:380) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:298) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:337) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:326) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:325) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.Range.foreach(Range.scala:155) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChildren(FlinkChangelogModeInferenceProgram.scala:325) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visit(FlinkChangelogModeInferenceProgram.scala:275) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.visitChild(FlinkChangelogModeInferenceProgram.scala:337) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1(FlinkChangelogModeInferenceProgram.scala:326) at org.apache.flink.table.planner.plan.optimize.program.FlinkChangelogModeInferenceProgram$SatisfyModifyKindSetTraitVisitor.$anonfun$visitChildren$1$adapted(FlinkChangelogModeInferenceProgram.scala:325) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
我的初步理解是FlinkSQL 不支持这样CDC这样由插入、更新、删除的流的分组聚合。 那面对我这样的情况,该用什么方案来解决? 望知道的各位告知一下,感谢!*来自志愿者整理的flink邮件归档
参考答案:
Flink SQL 的 window agg 目前不支持输入含有更新和删除消息。 你可以使用非 window 聚合来代替。
Btw,你可能说一下你的需求场景么? 为什么需要在 CDC 上做 window 操作呢?
*来自志愿者整理的flink邮件归档
关于本问题的更多回答可点击进行查看:
https://developer.aliyun.com/ask/364601?spm=a2c6h.13066369.question.98.33bf585f2BmIZo
问题四:TUMBLE函数不支持 回撤流
原sql
select 0 as id
, HOUR(TUMBLE_START(proctime ,interval '1' HOUR)) as ftime
,count(case when write_time >= DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') then memberid else NULL end) as paynum_h
,round(sum(case when write_time >= DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd') then real_product else 0 end)) as paymoney_h
from dwd_XXX
where write_time >=DATE_FORMAT(LOCALTIMESTAMP, 'yyyy-MM-dd')
group by TUMBLE(proctime ,interval '1' HOUR);
报错: org.apache.flink.table.api.TableException: GroupWindowAggregate doesn't support consuming update and delete changes which is produced by node TableSourceScan 发现把kafka建表语句改成 json格式就可以
数据源不是flink-mysql-cdc得来的
是通过cannal 将binlog 写到kafka ,然后建了一个kafka 表(ods表),
'connector' = 'kafka', 'properties.group.id' = 'XX', 'properties.bootstrap.servers' = 'XX', 'topic' = 'ODS_XXX', 'scan.startup.mode' = 'group-offsets', 'format' = 'canal-json');
上面用于查询的dwd_XXX表是基于这个ods表做了一层数据清洗在insert into 进去的, 建kafka表的格式,使用的changelog-json:
WITH ( 'connector' = 'kafka', 'properties.group.id' = 'XX', 'properties.bootstrap.servers' = 'XXX', 'topic' = 'DWD_XXX', 'scan.startup.mode' = 'group-offsets', 'format' = 'changelog-json'); *来自志愿者整理的flink邮件归档
参考答案:
看下你的 dwd_XXX 这张表的类型,是 append 数据流,还是 retract 数据流。
如果是 retract ,应该就不能再上面进行窗口计算了。
*来自志愿者整理的flink邮件归档
关于本问题的更多回答可点击进行查看:
https://developer.aliyun.com/ask/364599?spm=a2c6h.13066369.question.101.33bf585fspxp8e
问题五:求助:Flink DataStream 的 windowoperator 后面apply 方法不执行
streamSource.flatMap(new ComeIntoMaxFlatMapFunction())
.assignTimestampsAndWatermarks(new CommonAssignerPeriodWatermarks<>(Time.seconds(1).toMilliseconds()))
.connect(ruleConfigSource) .process(new MetricDataFilterProcessFunction()) .keyBy((KeySelector<Metric, MetricDataKey>) metric -> { MetricDataKey metricDataKey = new MetricDataKey(); metricDataKey.setDomain(metric.getDomain()); metricDataKey.setStationAliasCode(metric.getStaId()); metricDataKey.setEquipMK(metric.getEquipMK()); metricDataKey.setEquipID(metric.getEquipID()); metricDataKey.setMetric(metric.getMetric()); return metricDataKey; })
.window(SlidingEventTimeWindows.of(Time.seconds(2), Time.seconds(1))) .apply(new RichWindowFunction<Metric, MetricDataList, MetricDataKey, TimeWindow>() { @Override public void apply(MetricDataKey tuple, TimeWindow window, Iterable input, Collector out) throws Exception { input.forEach(x->{ System.out.println("--->>>"+x); }); } })
我定义这个Topology中能正常执行keyBy,但是无法执行apply中的 System.out.println("--->>>"+x); *来自志愿者整理的flink邮件归档
参考答案:
应该是 window 还没达到触发的条件,可以看下 watermark 是否在推进
*来自志愿者整理的flink邮件归档
关于本问题的更多回答可点击进行查看:
https://developer.aliyun.com/ask/364598?spm=a2c6h.13066369.question.102.33bf585fnTOedz