Ububtu18.04安装Flume1.9.0以及相关知识点

2022-11-24 264

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 文章目录Ububtu18.04安装Flume1.9.0以及相关知识点Flume简介版本选择下载及安装测试下载地址：

Ububtu18.04安装Flume1.9.0以及相关知识点

Flume简介

官网地址：http://flume.apache.org/index.html

一个强烈推荐的中文翻译文档：https://flume.liyifeng.org/

Flume is a distributed, reliable, and available service for

efficiently collecting, aggregating, and moving large amounts of log

data. It has a simple and flexible architecture based on streaming

data flows. It is robust and fault tolerant with tunable reliability

mechanisms and many failover and recovery mechanisms. It uses a simple

extensible data model that allows for online analytic application.

译文：Flume是一种分布式的、可靠的、可用的服务，用于高效地收集、聚合和移动大量的日志数据。它具有基于流数据流的简单而灵活的架构。它具有可调的可靠性机制和许多故障转移和恢复机制，具有健壮性和容错能力。它使用一个简单的可扩展数据模型，允许在线分析应用程序。

版本选择

Flume1.9.0的稳定版是在2019年1月发布的，1.9.0也是目前最新版本，选择这个版本的另一个原因是，Flume向后兼容。

January 8, 2019 - Apache Flume 1.9.0 Released

The Apache Flume team is pleased to announce the release of Flume

1.9.0.

Flume is a distributed, reliable, and available service for

efficiently collecting, aggregating, and moving large amounts of

streaming event data.

Version 1.9.0 is the eleventh Flume release as an Apache top-level

project. Flume 1.9.0 is stable, production-ready software, and is

backwards-compatible with previous versions of the Flume 1.x codeline.

Several months of active development went into this release: about 70

patches were committed since 1.8.0, representing many features,

enhancements, and bug fixes. While the full change log can be found on

the 1.9.0 release page (link below), here are a few new feature

highlights:

下载及安装测试

下载地址：

http://flume.apache.org/download.html

本文下载的文件为：apache-flume-1.9.0-bin.tar.gz

备注：文件校验为可选

进行文件校验：打开在线校验网站：http://www.metools.info/code/c92.html

将Flume1.9.0上传，选择sha512，即可计算sha512的值。

a989c50389c779dd7554c98bdba687fa982d6079d308c85ac210d3e523aa54b4b7452f38fe30d9acdac327080fe316d604b5efb0f3943cbacb4839fb2261f535

打开上一步的校验文件（单击链接：apache-flume-1.9.0-bin.tar.gz.sha512），会弹出

a989c50389c779dd7554c98bdba687fa982d6079d308c85ac210d3e523aa54b4b7452f38fe30d9acdac327080fe316d604b5efb0f3943cbacb4839fb2261f535

将这段校验值，与上一步的校验值进行比较，即可判断下载的文件是否被篡改过。

安装：

Flume的运行环境需要：

System Requirements

Java Runtime Environment - Java 1.8 or later

Memory - Sufficient memory for configurations used by sources, channels or sinks

Disk Space - Sufficient disk space for configurations used by channels or sinks

Directory Permissions - Read/Write permissions for directories used by agent

JDK1.8+

足够的内存

足够的磁盘

目录的读写权限

在具备环境后，将apache-flume-1.9.0-bin.tar.gz解压到指定路径就好：

比如Ubuntu的/home/hadoop/opt/app/apache-flume-1.9.0-bin目录下

切换到apache-flume-1.9.0-bin.tar.gz所在目录，执行以下命令完成apache-flume-1.9.0-bin.tar.gz解压缩。

tar -zxf apache-flume-1.9.0-bin.tar.gz -C /home/hadoop/opt/app

查看当前的目录结构：

测试Flume：

在安装目录下的conf目录下新建空文件netcat2logger.conf

cd /home/hadoop/opt/opp/apache-flume-1.9.0-bin/conf
vi netcat2logger.conf

加入一下内容：

# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 6666
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

上面的配置文件定义了一个agent的name为a1，a1的source监听6666端口，并且读取6666端口传过来的数据， a1的channel 采用内存作为缓存，a1的sink 类型为logs，具体含义可以参考官网，或是留言。

在flume的安装目录下执行如下命令，即可使用flume采集数据：

$ bin/flume-ng agent -n a1 -c conf -f conf/netcat2logger.conf -Dflume.root.logger=INFO,console

flume-ng agent ：表示flume的启动一个agent，ng是表示这是new的版本命令

-n a1：-n 表示name ，a1表示agent的名字为a1 对应配置文件中的a1

-c conf ：表示flume的配置文件目录所在位置

-f conf/netcat2logger.conf：表示自定义的数据采集配置文件位置。

-Dflume.root.logger=INFO,console：表示我们制定flume的日志格式，并且输出到控制台。

执行命令后，反馈如下：

hadoop@master:~/opt/app/apache-flume-1.9.0-bin$ bin/flume-ng agent -n a1 -c conf -f conf/netcat2logger.conf -Dflume.root.logger=INFO,console
...
...
2021-11-18 10:15:07,052 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting
2021-11-18 10:15:07,079 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:166)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:6666]

会看到NetcatSource已经启动成功。

这时，我们再开启一个新的终端，通过telnet 或 nc命令发送socket数据。

telnet 127.0.0.1 6666,然后输入hello world，会看到反馈的信息ok。

hadoop@master:~$ telnet 127.0.0.1 6666
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
helloworld
OK

切换到启动flume-ng命令的终端，查看信息，会看到，helloworld已经输出到控制台上了。

2021-11-18 10:22:44,799 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 77 6F 72 6C 64 0D                helloworld. }

Flume的相关知识点

Sources,Channels，Sinks配置

Flume的source配置，见http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#flume-sources

Flume的channel配置，见

http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#flume-channels

Flume的sink配置，见

http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#flume-sinks

数据流模型

Flume中的数据传递被称为event事件event就是数据流单元。Flume中的agent被称为代理，agent的本质是一个(JVM)进程，每个agent中包含了source，channel，sink这几个组件，这些组件会把数据从一个地方（source）采集到目的地（sink）中（被称为一个hop，跳）。

可靠性

在每个agent中，event都会暂存在channel中。然后将event传递给下一个agent或是终端存储库中（如sink的类型为HDFS时）。这些event在存储到下一个agent的channel中或是存储到终端存储中（如HDFS）中后，才会在当前agent的channel中将event删除。这样只有在将事件存储到下一个代理的通道或终端存储库中之后，它们才会从通道中删除。这种方式提供了Flume在消息传递时的端到端可靠性。

可恢复性

当消息传递失败时，event由于已经暂存在channel中，可以从channel中恢复。Flume支持持久化channel（比如采用本地文件系统作为channel），如果追求性能，也可采用memory作为channel，但这样有可能存在数据丢失无法恢复的情况。

多个agent流

可以通过avro类型，实现让数据在多个agent之间传递。具体方法为前一个agent的sink类型为avro类型，下一个agent的source类型为avro类型，并配置好对应的主机名和端口号，这样就能实现数据在多个agent之间传递。

合并操作

Flume支持将多个位置的数据进行合并操作，比如将数百台服务器上的日志信息合并到一个HDFS文件系统中，配置加入如下：

多路复用流

Flume支持将事件流多路传输到一个或多个目的地。这是通过定义流多路复用器来实现的，流多路复用器可以将事件复制或有选择地路由到一个或多个通道。

多路复用的配置文件格式如下：

# list the sources, sinks and channels for the agent
<Agent>.sources = <Source>
<Agent>.sinks = <Sink>
<Agent>.channels = <Channel1> <Channel2>
# set channel for source
<Agent>.sources.<Source>.channels = <Channel1> <Channel2> ...
# set channel for sink
<Agent>.sinks.<Sink>.channel = <Channel1>

Flume高可靠

failover故障迁移可参考：

http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#failover-sink-processor

load-balancing负载均衡可参考：

http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#load-balancing-sink-processor

如下为高可靠的架构图，具体可参考：http://www.zhuyongpeng.cn/1543.html

Ububtu18.04安装Flume1.9.0以及相关知识点

Ububtu18.04安装Flume1.9.0以及相关知识点

Flume简介

版本选择

下载及安装测试

下载地址：

备注：文件校验为可选

安装：

测试Flume：

Flume的相关知识点

Sources,Channels，Sinks配置

数据流模型

可靠性

可恢复性

多个agent流

合并操作

多路复用流

Flume高可靠

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Ububtu18.04安装Flume1.9.0以及相关知识点

Ububtu18.04安装Flume1.9.0以及相关知识点

Flume简介

版本选择

下载及安装测试

下载地址：

备注：文件校验为可选

安装：

测试Flume：

Flume的相关知识点

Sources,Channels，Sinks配置

数据流模型

可靠性

可恢复性

多个agent流

合并操作

多路复用流

Flume高可靠

热门文章

最新文章

相关课程

相关电子书