快速使用
- 打开
https://flink.apache.org/downloads/
下载 flink
因为书籍介绍的是
1.12
版本的,为避免不必要的问题,下载相同版本
- 解压
tar -xzvf flink-1.11.2-bin-scala_2.11.tgz
- 启动 flink
./bin/start-cluster.sh
- 打开 flink web 页面
localhost:8081
- 编写结合 Kafka 词频统计程序
具体参考 https://weread.qq.com/web/reader/51032ac07236f8e05107816k1f032c402131f0e3dad99f3?
package org.example; import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.api.common.typeinfo.Types; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; import org.apache.flink.util.Collector; import java.util.Properties; public class WordCountKafkaInStdOut { public static void main(String[] args) throws Exception { // 设置Flink执行环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // Kafka参数 Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); properties.setProperty("group.id", "flink-group"); String inputTopic = "Shakespeare"; String outputTopic = "WordCount"; // Source FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(inputTopic, new SimpleStringSchema(), properties); DataStream<String> stream = env.addSource(consumer); // Transformation // 使用Flink API对输入流的文本进行操作 // 按空格切词、计数、分区、设置时间窗口、聚合 DataStream<Tuple2<String, Integer>> wordCount = stream .flatMap((String line, Collector<Tuple2<String, Integer>> collector) -> { String[] tokens = line.split("\\s"); // 输出结果 for (String token : tokens) { if (token.length() > 0) { collector.collect(new Tuple2<>(token, 1)); } } }) .returns(Types.TUPLE(Types.STRING, Types.INT)) .keyBy(0) .timeWindow(Time.seconds(5)) .sum(1); // Sink wordCount.print(); // execute env.execute("kafka streaming word count"); } }
- 打包应用(当然在这之前需要本地调试一下,至少得运行通吧😄)
- 使用Flink提供的命令行工具flink,将打包好的作业提交到集群上。命令行的参数
--class
用来指定哪个主类作为入口。
./bin/flink run --class org.example.WordCountKafkaInStdOut xxtarget/flink_study-1.0-SNAPSHOT.jar
class 建议直接拷贝引用
- web 页面查看作业提交成功
- kafka 生产者随便发点消息
- 查看作业日志,词频统计结果
- 关闭
flink
./bin/stop-cluster.sh