2-网站日志分析案例-基于Flume采集WEB日志-windows版本
1.Flume简介
Flume is a distributed, reliable, and available service for
efficiently collecting, aggregating, and moving large amounts of log
data. It has a simple and flexible architecture based on streaming
data flows. It is robust and fault tolerant with tunable reliability
mechanisms and many failover and recovery mechanisms. It uses a simple
extensible data model that allows for online analytic application.
译文:Flume是一种分布式的、可靠的、可用的服务,用于高效地收集、聚合和移动大量的日志数据。它具有基于流数据流的简单而灵活的架构。它具有可调的可靠性机制和许多故障转移和恢复机制,具有健壮性和容错能力。它使用一个简单的可扩展数据模型,允许在线分析应用程序。
2.在Windows环境下安装Flume
1.本地需要配置JAVA_HOME
2.下载flume
https://flume.apache.org/download.html页面,我下载的版本为1.9.0
3.启动命令测试
D:\apache\apache-flume-1.9.0-bin\apache-flume-1.9.0-bin\bin>flume-ng version
安装是很简单的。
3.基于Flume完成Windows下的日志采集
3.1流程
sources类型选择:因为window下没有tail命令,所以无法监控单个文件,需要通过spooldir监控日志目录
channels类型选择:为了快,便捷,选择memory
sinks类型选择:采用logger和file_roll两种,其中logger为了查看是否成功,file_roll实现日志文件迁移
3.2具体配置
# 配置agent1的三个组件 agent1.sources = source1 agent1.sinks = sink1 sink2 sink3 agent1.channels = channel1 # Describe/configure spooldir source1 agent1.sources.source1.channels = channel1 agent1.sources.source1.type = spooldir agent1.sources.source1.spoolDir = E://log agent1.sources.source1.inputCharset = GBK agent1.sources.source1.fileHeader = true #configure host for source agent1.sources.source1.interceptors = i1 agent1.sources.source1.interceptors.i1.type = host agent1.sources.source1.interceptors.i1.hostHeader = hostname # Describe sink1 file_roll agent1.sinks.sink1.channel = channel1 agent1.sinks.sink1.type = file_roll agent1.sinks.sink1.sink.directory = D://flume-collection agent1.sinks.sink1.sink.rollInterval = 60 # Describe sink2 agent1.sinks.sink2.channel = channel1 agent1.sinks.sink2.type = logger # Describe sink3 agent1.sinks.sink3.channel = channel1 agent1.sinks.sink3.type = hdfs agent1.sinks.sinks.hdfs.useLocalTimeStamp = true agent1.sinks.sink3.hdfs.path = /sx/logtable/%Y-%m-%d agent1.sinks.sink3.hdfs.filePrefix = logevent agent1.sinks.sink3.hdfs.rollInterval = 10 agent1.sinks.sink3.hdfs.rollSize = 134217728 agent1.sinks.sink3.hdfs.rollCount = 0 # Use a channel which buffers events in memory agent1.channels.channel1.type = memory agent1.channels.channel1.keep-alive = 120 agent1.channels.channel1.capacity = 500000 agent1.channels.channel1.transactionCapacity = 600
3.3 启动
D:\apache\apache-flume-1.9.0-bin\apache-flume-1.9.0-bin>.\bin\flume-ng agent --conf conf --conf-file c
D:\apache\apache-flume-1.9.0-bin\apache-flume-1.9.0-bin>.\bin\flume-ng agent --conf conf --conf-file conf\log2file.conf --name agent1 -property flume.root.logger=INFO,console
3.4 注意事项
1.需要确定读取日志文件的编码格式,默认读取格式为UTF-8,如果编码不为默认,需要手动修改,如果编码格式不正确,可能出现:
FATAL: Spool Directory source source1: { spoolDir: E://log }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing. java.nio.charset.MalformedInputException: Input length = 1
2.agent1.sinks.sink1.sink.directory = D://flume-collection,要注意配置的key为sink.directory,否则会出现
Directory may not be null
的错误
4.总结
本文主要基于Flume实现了日志的采集,本文案例不复杂,但由于基于windows实现的案例不多,笔者尽量把自己遇到的问题描述在博客中,包括编码问题和配置的注意事项,减少大家的试错成本。