Flink-01 介绍Flink Java 3分钟上手 HelloWorld 和 Stream ExecutionEnvironment DataSet FlatMapFunction

本文涉及的产品
实时计算 Flink 版,5000CU*H 3个月
简介: Flink-01 介绍Flink Java 3分钟上手 HelloWorld 和 Stream ExecutionEnvironment DataSet FlatMapFunction

代码仓库

会同步代码到 GitHub

https://github.com/turbo-duck/flink-demo

基本介绍

Flink介绍

Apache Flink 是一个开源的流处理框架,旨在处理批处理和实时数据处理,具有高吞吐量和低延迟的特点。由 Apache 软件基金会开发,Flink 以其强大的流分析、复杂事件处理和数据驱动应用而闻名。

Flink特点

流处理:Flink 将批处理视为流处理的一种特殊情况。这种方法允许实时数据处理,实现即时的洞察和行动。

有状态计算:Flink 提供强大的状态管理,使得在处理流的过程中可以保持状态。这一特性对于需要容错和一致性的应用至关重要。

事件时间处理:Flink 允许用户基于事件时间来处理数据,即使数据无序到达,也能提供准确及时的结果。

容错性:Flink 的状态管理和检查点机制确保系统在出现故障时能够恢复而不丢失状态,维护数据完整性和应用一致性。

高吞吐量和低延迟:Flink 的架构优化了高吞吐量和低延迟,适合高性能应用。

可扩展性:Flink 可以扩展到数千个节点,能够处理大规模数据处理任务。

灵活的部署选项:Flink 可以部署在各种环境中,包括独立集群、云环境和容器编排平台(如 Kubernetes)。

Flink场景

实时分析:Flink 非常适合需要实时数据处理和分析的应用,例如监控和告警系统。

数据管道处理:Flink 可以用于构建数据管道,实时处理和转换数据,确保数据始终是最新的。

事件驱动应用:需要实时响应事件的应用,例如欺诈检测系统和推荐引擎,可以利用 Flink 的实时处理能力。

复杂事件处理:Flink 能够处理复杂事件处理场景,需要关联和分析多个事件。

Flink架构

Job Manager:Job Manager 负责管理 Flink 应用的生命周期,包括调度任务、协调检查点和故障恢复。

Task Managers:Task Managers 是执行实际数据处理任务的工作节点,管理任务执行和资源分配。

客户端:客户端用于将 Flink 任务提交给 Job Manager,可以是命令行界面、网页界面或 API。

状态后台:状态后台存储 Flink 应用的状态,支持不同的存储选项,包括内存、文件系统和分布式存储系统(如 Apache Hadoop 和 Amazon S3)。

检查点和保存点:Flink 使用检查点定期快照应用的状态。保存点是手动触发的快照,可用于版本控制和回滚。

初始工程

建立一个新的JavaMaven工程

pom.xml

https://mvnrepository.com/search?q=flink-java

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>flink-demo-01</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <flink.version>1.13.2</flink.version> <!-- 确保版本号正确 -->
        <scala.binary.version>2.12</scala.binary.version> <!-- 确保Scala版本正确 -->
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>
</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_${scala.binary.version}</artifactId>
            <version>${flink.version}</version>
        </dependency>
    </dependencies>
</project>

Demo01.java

package icu.wzk.demo01;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.util.Collector;
import org.apache.flink.api.java.tuple.Tuple2;


public class StartApp {

    public static void main(String[] args) throws Exception {
        String inPath = "demo01/file.txt";
        String outputPath = "demo01/result.csv";
        ExecutionEnvironment executionEnvironment = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<String> text = executionEnvironment.readTextFile(inPath);
        DataSet<Tuple2<String, Integer>> dataSet = text
                .flatMap(new LineSplitter())
                .groupBy(0)
                .sum(1);
        dataSet
                .writeAsCsv(outputPath,"\n"," ", FileSystem.WriteMode.OVERWRITE)
                .setParallelism(1);
        executionEnvironment.execute("file.txt -> result.csv");
    }

    static class LineSplitter implements FlatMapFunction<String, Tuple2<String,Integer>> {
        @Override
        public void flatMap(String line, Collector<Tuple2<String, Integer>> collector) throws Exception {
            for (String word : line.split(" ")) {
                collector.collect(new Tuple2<>(word,1));
            }
        }
    }
}

文本样例

这里提供了一个简单的文本供大家测试,详细的目录信息如下图:

文本的内容如下:

The narrator introduces himself as a man who learned when he was a child that adults lack imagination and understanding. He is now a pilot who has crash-landed in a desert. He encounters a small boy who asks him for a drawing of a sheep, and the narrator obliges. The narrator, who calls the child the little prince, learns that the boy comes from a very small planet, which the narrator believes to be asteroid B-612. Over the course of the next few days, the little prince tells the narrator about his life. On his asteroid-planet, which is no bigger than a house, the prince spends his time pulling up baobab seedlings, lest they grow big enough to engulf the tiny planet. One day an anthropomorphic rose grows on the planet, and the prince loves her with all his heart. However, her vanity and demands become too much for the prince, and he leaves.
The prince travels to a series of asteroids, each featuring a grown-up who has been reduced to a function. The first is a king who requires obedience but has no subjects until the arrival of the prince. The sole inhabitant of the next planet is a conceited man who wants nothing from the prince but flattery. The prince subsequently meets a drunkard, who explains that he must drink to forget how ashamed he is of drinking. The fourth planet introduces the prince to a businessman, who maintains that he owns the stars, whih makes it very important that he know exactly how many stars there are. The prince then encounters a lamplighter, who follows orders that require him to light a lamp each evening and put it out each morning, even though his planet spins so fast that dusk and dawn both occur once every minute. Finally the prince comes to a planet inhabited by a geographer. The geographer, however, knows nothing of his own planet, because it is his sole function to record what he learns from explorers. He asks the prince to describe his home planet, but when the prince mentions the flower, the geographer says that flowers are not recorded because they are ephemeral. The geographer recommends that the little prince visit Earth.
On Earth the prince meets a snake, who says that he can return him to his home, and a flower, who tells him that people lack roots. He comes across a rose garden, and he finds it very depressing to learn that his beloved rose is not, as she claimed, unique in the universe. A fox then tells him that if he tames the fox—that is, establishes ties with the fox—then they will be unique and a source of joy to each other.
The narrator and little prince have now spent eight days in the desert and have run out of water. The two then traverse the desert in search of a well, which, miraculously, they find. The little prince tells the narrator that he plans to return that night to his planet and flower and that now the stars will be meaningful to the narrator, because he will know that his friend is living on one of them. Returning to his planet requires allowing the poisonous snake to bite him. The story resumes six years later. The narrator says that the prince’s body was missing in the morning, so he knows that he returned to his planet, and he wonders whether the sheep that he drew him ate his flower. He ends by imploring the reader to contact him if they ever spot the little prince.

运行 demo01 结果(部分)

and 15
are. 1
baobab 1
be 3
bigger 1
body 1
calls 1
days, 1
demands 1
describe 1
drinking. 1
drunkard, 1
enough 1
featuring 1
function 1
know 2
knows 2
learns 2
little 6

相关实践学习
基于Hologres轻松玩转一站式实时仓库
本场景介绍如何利用阿里云MaxCompute、实时计算Flink和交互式分析服务Hologres开发离线、实时数据融合分析的数据大屏应用。
Linux入门到精通
本套课程是从入门开始的Linux学习课程,适合初学者阅读。由浅入深案例丰富,通俗易懂。主要涉及基础的系统操作以及工作中常用的各种服务软件的应用、部署和优化。即使是零基础的学员,只要能够坚持把所有章节都学完,也一定会受益匪浅。
目录
相关文章
|
3月前
|
Java 流计算
利用java8 的 CompletableFuture 优化 Flink 程序
本文探讨了Flink使用avatorscript脚本语言时遇到的性能瓶颈,并通过CompletableFuture优化代码,显著提升了Flink的QPS。文中详细介绍了avatorscript的使用方法,包括自定义函数、从Map中取值、使用Java工具类及AviatorScript函数等,帮助读者更好地理解和应用avatorscript。
利用java8 的 CompletableFuture 优化 Flink 程序
|
3月前
|
SQL 大数据 API
大数据-118 - Flink DataSet 基本介绍 核心特性 创建、转换、输出等
大数据-118 - Flink DataSet 基本介绍 核心特性 创建、转换、输出等
77 0
|
1月前
|
存储 Java 数据挖掘
Java 8 新特性之 Stream API:函数式编程风格的数据处理范式
Java 8 引入的 Stream API 提供了一种新的数据处理方式,支持函数式编程风格,能够高效、简洁地处理集合数据,实现过滤、映射、聚合等操作。
57 6
|
1月前
|
Java API 开发者
Java中的Lambda表达式与Stream API的协同作用
在本文中,我们将探讨Java 8引入的Lambda表达式和Stream API如何改变我们处理集合和数组的方式。Lambda表达式提供了一种简洁的方法来表达代码块,而Stream API则允许我们对数据流进行高级操作,如过滤、映射和归约。通过结合使用这两种技术,我们可以以声明式的方式编写更简洁、更易于理解和维护的代码。本文将介绍Lambda表达式和Stream API的基本概念,并通过示例展示它们在实际项目中的应用。
|
26天前
|
Rust 安全 Java
Java Stream 使用指南
本文介绍了Java中Stream流的使用方法,包括如何创建Stream流、中间操作(如map、filter、sorted等)和终结操作(如collect、forEach等)。此外,还讲解了并行流的概念及其可能带来的线程安全问题,并给出了示例代码。
|
2月前
|
消息中间件 资源调度 Java
用Java实现samza转换成flink
【10月更文挑战第20天】
|
2月前
|
安全 Java API
Java中的Lambda表达式与Stream API的高效结合####
探索Java编程中Lambda表达式与Stream API如何携手并进,提升数据处理效率,实现代码简洁性与功能性的双重飞跃。 ####
28 0
|
2月前
|
Java API 数据处理
探索Java中的Lambda表达式与Stream API
【10月更文挑战第22天】 在Java编程中,Lambda表达式和Stream API是两个强大的功能,它们极大地简化了代码的编写和提高了开发效率。本文将深入探讨这两个概念的基本用法、优势以及在实际项目中的应用案例,帮助读者更好地理解和运用这些现代Java特性。
|
3月前
|
Java API 数据处理
java Stream详解
【10月更文挑战第4天】
43 0
|
3月前
|
SQL 大数据 API
大数据-132 - Flink SQL 基本介绍 与 HelloWorld案例
大数据-132 - Flink SQL 基本介绍 与 HelloWorld案例
64 0