Flink窗口：聚合和输出到接收器_问答-阿里云开发者社区

我们有一个数据流，其中每个元素都是这种类型：

id: String
type: Type
amount: Integer
我们希望聚合此流并输出amount每周一次的总和。

当前解决方案

示例flink管道将如下所示：

stream.keyBy(type)

  .window(TumblingProcessingTimeWindows.of(Time.days(7)))
  .reduce(sumAmount())
  .addSink(someOutput())

用于输入

| id | type | amount |
| 1 | CAT | 10 |
| 2 | DOG | 20 |
| 3 | CAT | 5 |
| 4 | DOG | 15 |
| 5 | DOG | 50 |
如果窗口在记录3和4我们的输出之间结束将是：

| TYPE | sumAmount |
| CAT | 15 | (id 1 and id 3 added together)
| DOG | 20 | (only id 2 as been 'summed')
标识4和5仍然是弗林克管道内，下周将被输出。

因此，下周我们的总产量将是：

| TYPE | sumAmount |
| CAT | 15 | (of last week)
| DOG | 20 | (of last week)
| DOG | 65 | (id 4 and id 5 added together)
新要求：

我们现在还想知道每条记录在哪一周处理了每条记录。换句话说，我们的新输出应该是：

| TYPE | sumAmount | weekNumber |
| CAT | 15 | 1 |
| DOG | 20 | 1 |
| DOG | 65 | 2 |
但我们还想要一个像这样的额外输出：

| id | weekNumber |
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
怎么办呢？

flink有没有办法实现这个目标？我会想象我们会有一个汇总函数的汇总函数，但也会输出每个记录和当前周数，但我没有找到在文档中执行此操作的方法。

（注意：我们每周处理大约1亿条记录，所以理想情况下我们只希望在一周内将聚合保持在flink状态，而不是所有单个记录）

编辑：

我去了Anton描述的解决方案：

DataStream elements =
stream.keyBy(type)

    .process(myKeyedProcessFunction());

elements.addSink(outputElements());
elements.getSideOutput(outputTag)

    .addSink(outputAggregates())

而KeyedProcessFunction看起来像：

class MyKeyedProcessFunction extends KeyedProcessFunction

private ValueState<ZonedDateTime> state;
private ValueState<Integer> sum;

public void processElement(Element e, Context c, Collector<Element> out) {
    if (state.value() == null) {
        state.update(ZonedDateTime.now());
        sum.update(0);
        c.timerService().registerProcessingTimeTimer(nowPlus7Days);
    }
    element.addAggregationId(state.value());
    sum.update(sum.value() + element.getAmount());
}

public void onTimer(long timestamp, OnTimerContext c, Collector<Element> out) {
    state.update(null);
    c.output(outputTag, sum.value()); 
}

}

public void process(Type key, Context context, Iterable<Event> reducedEvents, Collector<Tuple3<Type, Long, Long>> out) { Long sum = reducedEvents.iterator().next(); out.collect(new Tuple3<Type, Long, Long>(key, context.window.getStart(), sum)); }

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Flink窗口：聚合和输出到接收器

相关文章