2-网站日志分析案例-基于Flume采集WEB日志-windows版本

2022-11-24 242

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

日志服务 SLS，月写入数据量 50GB 1个月

简介： 文章目录2-网站日志分析案例-基于Flume采集WEB日志-windows版本1.Flume简介2.在Windows环境下安装Flume

2-网站日志分析案例-基于Flume采集WEB日志-windows版本

1.Flume简介

Flume is a distributed, reliable, and available service for

efficiently collecting, aggregating, and moving large amounts of log

data. It has a simple and flexible architecture based on streaming

data flows. It is robust and fault tolerant with tunable reliability

mechanisms and many failover and recovery mechanisms. It uses a simple

extensible data model that allows for online analytic application.

译文：Flume是一种分布式的、可靠的、可用的服务，用于高效地收集、聚合和移动大量的日志数据。它具有基于流数据流的简单而灵活的架构。它具有可调的可靠性机制和许多故障转移和恢复机制，具有健壮性和容错能力。它使用一个简单的可扩展数据模型，允许在线分析应用程序。

2.在Windows环境下安装Flume

1.本地需要配置JAVA_HOME

2.下载flume

https://flume.apache.org/download.html页面，我下载的版本为1.9.0

3.启动命令测试

D:\apache\apache-flume-1.9.0-bin\apache-flume-1.9.0-bin\bin>flume-ng version

安装是很简单的。

3.基于Flume完成Windows下的日志采集

3.1流程

sources类型选择：因为window下没有tail命令，所以无法监控单个文件，需要通过spooldir监控日志目录

channels类型选择：为了快，便捷，选择memory

sinks类型选择：采用logger和file_roll两种，其中logger为了查看是否成功，file_roll实现日志文件迁移

3.2具体配置

# 配置agent1的三个组件
agent1.sources = source1
agent1.sinks = sink1 sink2 sink3
agent1.channels = channel1
# Describe/configure spooldir source1
agent1.sources.source1.channels = channel1
agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = E://log
agent1.sources.source1.inputCharset = GBK
agent1.sources.source1.fileHeader = true
#configure host for source
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname
# Describe sink1 file_roll
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.type = file_roll
agent1.sinks.sink1.sink.directory = D://flume-collection
agent1.sinks.sink1.sink.rollInterval = 60
# Describe sink2
agent1.sinks.sink2.channel = channel1
agent1.sinks.sink2.type = logger
# Describe sink3
agent1.sinks.sink3.channel = channel1
agent1.sinks.sink3.type = hdfs
agent1.sinks.sinks.hdfs.useLocalTimeStamp = true
agent1.sinks.sink3.hdfs.path = /sx/logtable/%Y-%m-%d
agent1.sinks.sink3.hdfs.filePrefix = logevent
agent1.sinks.sink3.hdfs.rollInterval = 10
agent1.sinks.sink3.hdfs.rollSize = 134217728
agent1.sinks.sink3.hdfs.rollCount = 0
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 500000
agent1.channels.channel1.transactionCapacity = 600

3.3 启动

D:\apache\apache-flume-1.9.0-bin\apache-flume-1.9.0-bin>.\bin\flume-ng agent --conf conf --conf-file  c

D:\apache\apache-flume-1.9.0-bin\apache-flume-1.9.0-bin>.\bin\flume-ng agent --conf conf --conf-file conf\log2file.conf --name agent1 -property flume.root.logger=INFO,console

3.4 注意事项

1.需要确定读取日志文件的编码格式，默认读取格式为UTF-8，如果编码不为默认，需要手动修改，如果编码格式不正确，可能出现：

FATAL: Spool Directory source source1: { spoolDir: E://log }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.nio.charset.MalformedInputException: Input length = 1

2.agent1.sinks.sink1.sink.directory = D://flume-collection，要注意配置的key为sink.directory，否则会出现

Directory may not be null

的错误

4.总结

本文主要基于Flume实现了日志的采集，本文案例不复杂，但由于基于windows实现的案例不多，笔者尽量把自己遇到的问题描述在博客中，包括编码问题和配置的注意事项，减少大家的试错成本。

2-网站日志分析案例-基于Flume采集WEB日志-windows版本

2-网站日志分析案例-基于Flume采集WEB日志-windows版本

1.Flume简介

2.在Windows环境下安装Flume

3.基于Flume完成Windows下的日志采集

3.1流程

3.2具体配置

3.3 启动

3.4 注意事项

4.总结

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

2-网站日志分析案例-基于Flume采集WEB日志-windows版本

2-网站日志分析案例-基于Flume采集WEB日志-windows版本

1.Flume简介

2.在Windows环境下安装Flume

3.基于Flume完成Windows下的日志采集

3.1流程

3.2具体配置

3.3 启动

3.4 注意事项

4.总结

热门文章

最新文章

相关课程

相关电子书

相关实验场景