多数据源汇总案例实现

多数据源汇总案例实现 | 学习笔记

2021-12-14 235

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 快速学习多数据源汇总案例实现。

开发者学堂课程【数据采集系统 Flume ：多数据源汇总案例实现】学习笔记，与课程紧密联系，让用户快速学习知识。

课程地址：https://developer.aliyun.com/learning/course/99/detail/1640

0.准备工作

分发 Flume

[atguiguehadoop102 module]S xsync flume

在 hadoop102.hadoop103 以及hadoop104的/opt/hmodule/fume/job 目录下创建一个 growp3 文件夹。

[atguiguehadoog102 job]s|

i mkdin group3

[atguigu@hadoop103 job]$

mkdiz group3

[atguigu@hadoop104 job]$

imkdir group3

1.创建 fume1-logger-flume.conf

配置 Source 用于监控 hive.log 文件，配置 Sink 输出数据到下一级 Flume。

在 hadoop103 上创建配置文件并打开

[atguigu@hadoop103 group3]s touch flumel-logger-flume.conf

[atguigu@hadoop103 group3]s vim flumel-logger-flume.conf

添加如下内容

# Name the components on this agent

al.sources = r1

al.sinks = k1

al.channels = c1

# Describe/configure the source

lal.sources.rl.type - exec

lal.sources.r1.command - tail -F /opt/module/group.loge

a1.sources.r1.shell =/bin/bash -c

# Describe the sink

al.sinks.kl.type = avre

al.sinks.k1.hostname =hadoop1044

al.sinks.k1.port = 4141

# Describe the channel

a1.channels.cl.type = memory

al.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

al.sources.r1.channels = c1

al.sinks.kl.channel = c1

2.创建 flume2-netcat-flume.gonf

配置 Source 监控端口 44444 数据流，配置 Sink 数据到下一级 Flume：

在 hadoop102 上创建配置文件并打开

[atquigu@hadoop102 group3]$ touch flume2-netcat-flume.conf!

[atguigu@hadoop102 group3]$ xim flume2-netcat-flume.conf

添加如下内容

# Name the components on this agenty

a2.sources = r1

a2.sinks = k1

a2.channels = c1

# Describe/configure the source

a2.sources.rl.type = netcaty

a2.sources.rl.bind = hadoop102

a2.sources.r1.port = 44444

# Describe the sink

a2.sinks.kl.type = avro

a2.sinks.kl.hostname = hadoop104

a2.sinks.kl port = 4141

# Use a channel which buffers events in memory

a2.channels.cl type = memory

a2.channels cl.capacity = 1000

a2.channels.cl.transactionCapacity = 100

# Bind the source and sink to the channel

a2.sources.r1.channels = c1

a2.sinks.kl.channel = c1

3.创建 flume3-flume-logger.conf

配置 source 用于接收 Aumel 与 flume2 发送过来的数据流，最终合并后 sink 到控制台。

在 hadoop104 上创建配置文件并打开

[atguigu@hadoop104 groups]$ touch flume3-flume-logger.conf

[atguigu@hadoppl04 groupsj)s vim flume3-rlume-logger.conf

添加如下内容

# Name the components on thils agent

la3.sources = r1

a3.sinks = k1

a3.channels = c1

# Describe/configure the source

la3.sources.r1.type=avro

la3.sources.r1.bind=hadoop104

a3.sources.r1.port=4141

# Describe the sink

la3.sinks.k1.type = logger

# Describe the channel

a3.channels.cl.type = memory

a3.channels.cl.capacity = 1000

a3.channels.c1.transactionCapacity = 100

Bind the source and sink to the channel

la3.sources.rl.channels = c1

a3.sinks.k1.channel=c1

4.执行配置文件

分别开启对应配置文件：

Alume3-Alume-logger.conf，flume2-netcat-flume.conf，flume1-

logger-flume.conf

[atguiguehaaoqp104 flume)]$ pinElume-ng agent --conf conf/

name a3

--conf-file

job/group3/Flume3-flume-logger.conf

Dflume.root.logger=INFO,console

[atguigu@hadoop102 flume]$ bin/flume-ng agent --conf conf/

name a2 --conf-file job/group3/flume2-netcat-flume.conf

[atguigu@hadoop103 flume]$ bin/flume-ng agent --conf gonf/

name al --conf-file job/group3/flume1-logger-flume.conf

5.在 hadoop103 上向 /opt/module 目录下的 group.log 追加内容

[atguigu@hadoop103 module]$ echo 'hello'> group.log

6.在 hadoop102 上向 44444 端口发送数据

[atguiguehadoop102 flume】$ telnet.hadoop102 44444

多数据源汇总案例实现 | 学习笔记