Flink-01 介绍Flink Java 3分钟上手 HelloWorld 和 Stream ExecutionEnvironment DataSet FlatMapFunction

简介: Flink-01 介绍Flink Java 3分钟上手 HelloWorld 和 Stream ExecutionEnvironment DataSet FlatMapFunction

代码仓库

会同步代码到 GitHub

https://github.com/turbo-duck/flink-demo

基本介绍

Flink介绍

Apache Flink 是一个开源的流处理框架,旨在处理批处理和实时数据处理,具有高吞吐量和低延迟的特点。由 Apache 软件基金会开发,Flink 以其强大的流分析、复杂事件处理和数据驱动应用而闻名。

Flink特点

流处理:Flink 将批处理视为流处理的一种特殊情况。这种方法允许实时数据处理,实现即时的洞察和行动。

有状态计算:Flink 提供强大的状态管理,使得在处理流的过程中可以保持状态。这一特性对于需要容错和一致性的应用至关重要。

事件时间处理:Flink 允许用户基于事件时间来处理数据,即使数据无序到达,也能提供准确及时的结果。

容错性:Flink 的状态管理和检查点机制确保系统在出现故障时能够恢复而不丢失状态,维护数据完整性和应用一致性。

高吞吐量和低延迟:Flink 的架构优化了高吞吐量和低延迟,适合高性能应用。

可扩展性:Flink 可以扩展到数千个节点,能够处理大规模数据处理任务。

灵活的部署选项:Flink 可以部署在各种环境中,包括独立集群、云环境和容器编排平台(如 Kubernetes)。

Flink场景

实时分析:Flink 非常适合需要实时数据处理和分析的应用,例如监控和告警系统。

数据管道处理:Flink 可以用于构建数据管道,实时处理和转换数据,确保数据始终是最新的。

事件驱动应用:需要实时响应事件的应用,例如欺诈检测系统和推荐引擎,可以利用 Flink 的实时处理能力。

复杂事件处理:Flink 能够处理复杂事件处理场景,需要关联和分析多个事件。

Flink架构

Job Manager:Job Manager 负责管理 Flink 应用的生命周期,包括调度任务、协调检查点和故障恢复。

Task Managers:Task Managers 是执行实际数据处理任务的工作节点,管理任务执行和资源分配。

客户端:客户端用于将 Flink 任务提交给 Job Manager,可以是命令行界面、网页界面或 API。

状态后台:状态后台存储 Flink 应用的状态,支持不同的存储选项,包括内存、文件系统和分布式存储系统(如 Apache Hadoop 和 Amazon S3)。

检查点和保存点:Flink 使用检查点定期快照应用的状态。保存点是手动触发的快照,可用于版本控制和回滚。

初始工程

建立一个新的JavaMaven工程

pom.xml

https://mvnrepository.com/search?q=flink-java

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>flink-demo-01</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <flink.version>1.13.2</flink.version> <!-- 确保版本号正确 -->
        <scala.binary.version>2.12</scala.binary.version> <!-- 确保Scala版本正确 -->
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>
</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
    </dependencies>
</project>

Demo01.java

package icu.wzk.demo01;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.util.Collector;
import org.apache.flink.api.java.tuple.Tuple2;


public class StartApp {

    public static void main(String[] args) throws Exception {
        String inPath = "demo01/file.txt";
        String outputPath = "demo01/result.csv";
        ExecutionEnvironment executionEnvironment = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<String> text = executionEnvironment.readTextFile(inPath);
        DataSet<Tuple2<String, Integer>> dataSet = text
                .flatMap(new LineSplitter())
                .groupBy(0)
                .sum(1);
        dataSet
                .writeAsCsv(outputPath,"\n"," ", FileSystem.WriteMode.OVERWRITE)
                .setParallelism(1);
        executionEnvironment.execute("file.txt -> result.csv");
    }

    static class LineSplitter implements FlatMapFunction<String, Tuple2<String,Integer>> {
        @Override
        public void flatMap(String line, Collector<Tuple2<String, Integer>> collector) throws Exception {
            for (String word : line.split(" ")) {
                collector.collect(new Tuple2<>(word,1));
            }
        }
    }
}

文本样例

这里提供了一个简单的文本供大家测试,详细的目录信息如下图:

文本的内容如下:

The narrator introduces himself as a man who learned when he was a child that adults lack imagination and understanding. He is now a pilot who has crash-landed in a desert. He encounters a small boy who asks him for a drawing of a sheep, and the narrator obliges. The narrator, who calls the child the little prince, learns that the boy comes from a very small planet, which the narrator believes to be asteroid B-612. Over the course of the next few days, the little prince tells the narrator about his life. On his asteroid-planet, which is no bigger than a house, the prince spends his time pulling up baobab seedlings, lest they grow big enough to engulf the tiny planet. One day an anthropomorphic rose grows on the planet, and the prince loves her with all his heart. However, her vanity and demands become too much for the prince, and he leaves.
The prince travels to a series of asteroids, each featuring a grown-up who has been reduced to a function. The first is a king who requires obedience but has no subjects until the arrival of the prince. The sole inhabitant of the next planet is a conceited man who wants nothing from the prince but flattery. The prince subsequently meets a drunkard, who explains that he must drink to forget how ashamed he is of drinking. The fourth planet introduces the prince to a businessman, who maintains that he owns the stars, whih makes it very important that he know exactly how many stars there are. The prince then encounters a lamplighter, who follows orders that require him to light a lamp each evening and put it out each morning, even though his planet spins so fast that dusk and dawn both occur once every minute. Finally the prince comes to a planet inhabited by a geographer. The geographer, however, knows nothing of his own planet, because it is his sole function to record what he learns from explorers. He asks the prince to describe his home planet, but when the prince mentions the flower, the geographer says that flowers are not recorded because they are ephemeral. The geographer recommends that the little prince visit Earth.
On Earth the prince meets a snake, who says that he can return him to his home, and a flower, who tells him that people lack roots. He comes across a rose garden, and he finds it very depressing to learn that his beloved rose is not, as she claimed, unique in the universe. A fox then tells him that if he tames the fox—that is, establishes ties with the fox—then they will be unique and a source of joy to each other.
The narrator and little prince have now spent eight days in the desert and have run out of water. The two then traverse the desert in search of a well, which, miraculously, they find. The little prince tells the narrator that he plans to return that night to his planet and flower and that now the stars will be meaningful to the narrator, because he will know that his friend is living on one of them. Returning to his planet requires allowing the poisonous snake to bite him. The story resumes six years later. The narrator says that the prince’s body was missing in the morning, so he knows that he returned to his planet, and he wonders whether the sheep that he drew him ate his flower. He ends by imploring the reader to contact him if they ever spot the little prince.

运行 demo01 结果(部分)

and 15
are. 1
baobab 1
be 3
bigger 1
body 1
calls 1
days, 1
demands 1
describe 1
drinking. 1
drunkard, 1
enough 1
featuring 1
function 1
know 2
knows 2
learns 2
little 6

相关实践学习
基于Hologres+Flink搭建GitHub实时数据大屏
通过使用Flink、Hologres构建实时数仓,并通过Hologres对接BI分析工具(以DataV为例),实现海量数据实时分析.
实时计算 Flink 实战课程
如何使用实时计算 Flink 搞定数据处理难题?实时计算 Flink 极客训练营产品、技术专家齐上阵,从开源 Flink功能介绍到实时计算 Flink 优势详解,现场实操,5天即可上手! 欢迎开通实时计算 Flink 版: https://cn.aliyun.com/product/bigdata/sc Flink Forward Asia 介绍: Flink Forward 是由 Apache 官方授权,Apache Flink Community China 支持的会议,通过参会不仅可以了解到 Flink 社区的最新动态和发展计划,还可以了解到国内外一线大厂围绕 Flink 生态的生产实践经验,是 Flink 开发者和使用者不可错过的盛会。 去年经过品牌升级后的 Flink Forward Asia 吸引了超过2000人线下参与,一举成为国内最大的 Apache 顶级项目会议。结合2020年的特殊情况,Flink Forward Asia 2020 将在12月26日以线上峰会的形式与大家见面。
目录
相关文章
|
3月前
|
Java Unix Go
【Java】(8)Stream流、文件File相关操作,IO的含义与运用
Java 为 I/O 提供了强大的而灵活的支持,使其更广泛地应用到文件传输和网络编程中。!但本节讲述最基本的和流与 I/O 相关的功能。我们将通过一个个例子来学习这些功能。
219 1
|
4月前
|
Java API 数据处理
Java新特性:使用Stream API重构你的数据处理
Java新特性:使用Stream API重构你的数据处理
|
4月前
|
Java 大数据 API
Java Stream API:现代集合处理与函数式编程
Java Stream API:现代集合处理与函数式编程
314 100
|
4月前
|
Java API 数据处理
Java Stream API:现代集合处理新方式
Java Stream API:现代集合处理新方式
335 101
|
4月前
|
并行计算 Java 大数据
Java Stream API:现代数据处理之道
Java Stream API:现代数据处理之道
277 101
|
4月前
|
存储 数据可视化 Java
Java Stream API 的强大功能
Java Stream API 是 Java 8 引入的重要特性,它改变了集合数据的处理方式。通过声明式语法,开发者可以更简洁地进行过滤、映射、聚合等操作。Stream API 支持惰性求值和并行处理,提升了代码效率和可读性,是现代 Java 开发不可或缺的工具。
114 0
Java Stream API 的强大功能
|
5月前
|
存储 NoSQL Java
Java Stream API:集合操作与并行处理
Stream API 是 Java 8 提供的集合处理工具,通过声明式编程简化数据操作。它支持链式调用、延迟执行和并行处理,能够高效实现过滤、转换、聚合等操作,提升代码可读性和性能。
|
5月前
|
存储 Java API
Java Stream API:现代数据处理之道
Java Stream API:现代数据处理之道
398 188
|
5月前
|
存储 Java API
Java Stream API:现代数据处理之道
Java Stream API:现代数据处理之道
312 92
|
6月前
|
Oracle Java 关系型数据库
掌握Java Stream API:高效集合处理的利器
掌握Java Stream API:高效集合处理的利器
425 80