【Flume】HDFSSink配置参数说明

简介:

Name

Default

Description

channel


type

The component type name, needs to be hdfs

hdfs.path

HDFS directory path (eg hdfs://namenode/flume/webdata/)

hdfs.filePrefix

FlumeData

HDSFSink产生文件的前缀名称,如果定义为 MyFile,则生成文件将会是/hdfspath/MyFile.后缀

hdfs.fileSuffix

定义产生文件的后缀。例如要生成文本类型的.txt作为后缀

hdfs.inUsePrefix

写入HDFS过程中,会生成一个临时文件,inUsePrefix则定义了临时文件的前缀

hdfs.inUseSuffix

.tmp

定义写入HDFS过程中,文件的后缀

hdfs.rollInterval

30

Number of seconds to wait before rolling current file (0 = never roll based on time interval)

按时间生成HDFS文件

hdfs.rollSize

1024

File size to trigger roll, in bytes (0: never roll based on file size)

按文件大小触发roll,生成新的HDFS文件

hdfs.rollCount

10

Number of events written to file before it rolled (0 = never roll based on number of events)

按写入的event的个数触发roll,生成新的HDFS文件

hdfs.idleTimeout

0

Timeout after which inactive files get closed (0 = disable automatic closing of idle files)

当达到idleTimeout时,关闭空闲的文件(默认为0,不自动关闭空闲的文件)

hdfs.batchSize

100

number of events written to file before it is flushed to HDFS

HDFSEventSink以batchSize的event个数作为一个事务处理,统一flush到HDFS中,提交事务

hdfs.codeC

Compression codec. one of following : gzip, bzip2, lzo, lzop, snappy

设置具体的压缩格式,在配置fileType为CompressedStream时,需要设置CodeC

hdfs.fileType

SequenceFile

File format: currently SequenceFileDataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC

在生成HDFSWriter的具体实现时,通过该参数制定HDFSWriter的实现类。

三个格式分别对应HDFSSequenceFile,HDFSDataStream,HDFSCompressedDataStream

hdfs.maxOpenFiles

5000

Allow only this number of open files. If this number is exceeded, the oldest file is closed.

HDFSSink中维护了与HDFS创建的文件的连接(每个文件对应一个BucketWriter),

如果超过该值,HDFSSink将会关闭创建最久的BucketWriter

hdfs.minBlockReplicas

Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath.

设置HDFS块的副本数,如果没有特殊设置,默认采用Hadoop的配置

hdfs.writeFormat

Writable

Format for sequence file records. One of Text or Writable. Set to Text before creating data files with Flume, otherwise those files cannot be read by either Apache Impala (incubating) or Apache Hive.

hdfs.callTimeout

10000

Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring.

hdfs.threadsPoolSize

10

Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)

处理HDFS相关操作的线程数

hdfs.rollTimerPoolSize

1

Number of threads per HDFS sink for scheduling timed file rolling

对HDFS文件进行roll操作的线程数

hdfs.kerberosPrincipal

Kerberos user principal for accessing secure HDFS

hdfs.kerberosKeytab

Kerberos keytab for accessing secure HDFS

hdfs.proxyUser



hdfs.round

false

Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)

hdfs.roundValue

1

Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.

hdfs.roundUnit

second

The unit of the round down value - secondminute or hour.

hdfs.timeZone

Local Time

Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles.

hdfs.useLocalTimeStamp

false

Use the local time (instead of the timestamp from the event header) while replacing the escape sequences.

hdfs.closeTries

0

Number of times the sink must try renaming a file, after initiating a close attempt. If set to 1, this sink will not re-try a failed rename (due to, for example, NameNode or DataNode failure), and may leave the file in an open state with a .tmp extension. If set to 0, the sink will try to rename the file until the file is eventually renamed (there is no limit on the number of times it would try). The file may still remain open if the close call fails but the data will be intact and in this case, the file will be closed only after a Flume restart.

hdfs.retryInterval

180

Time in seconds between consecutive attempts to close a file. Each close call costs multiple RPC round-trips to the Namenode, so setting this too low can cause a lot of load on the name node. If set to 0 or less, the sink will not attempt to close the file if the first attempt fails, and may leave the file open or with a ”.tmp” extension.

serializer

TEXT

Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.

serializer.*













     本文转自巧克力黒 51CTO博客,原文链接:http://blog.51cto.com/10120275/2052987,如需转载请自行联系原作者


相关文章
|
6月前
|
数据采集 消息中间件 监控
Flume数据采集系统设计与配置实战:面试经验与必备知识点解析
【4月更文挑战第9天】本文深入探讨Apache Flume的数据采集系统设计,涵盖Flume Agent、Source、Channel、Sink的核心概念及其配置实战。通过实例展示了文件日志收集、网络数据接收、命令行实时数据捕获等场景。此外,还讨论了Flume与同类工具的对比、实际项目挑战及解决方案,以及未来发展趋势。提供配置示例帮助理解Flume在数据集成、日志收集中的应用,为面试准备提供扎实的理论与实践支持。
270 1
|
6月前
|
Java
flume的log4j.properties配置说明
flume的log4j.properties配置说明
|
6月前
|
存储 监控 Linux
Flume【部署 02】Flume监控工具Ganglia的安装与配置(CentOS 7.5 在线安装系统监控工具Ganglia + 权限问题处理 + Flume接入监控配置 + 图例说明)
【2月更文挑战第17天】Flume【部署 02】Flume监控工具Ganglia的安装与配置(CentOS 7.5 在线安装系统监控工具Ganglia + 权限问题处理 + Flume接入监控配置 + 图例说明)
193 1
Flume【部署 02】Flume监控工具Ganglia的安装与配置(CentOS 7.5 在线安装系统监控工具Ganglia + 权限问题处理 + Flume接入监控配置 + 图例说明)
|
6月前
|
XML 数据格式
Flume【付诸实践 01】flume1.9.0版 配置格式说明+常用案例分享(ExecSource+SpoolingDirectorySource+HDFSSink+AvroSourceSink)
【2月更文挑战第19天】Flume【付诸实践 01】flume1.9.0版 配置格式说明+常用案例分享(ExecSource+SpoolingDirectorySource+HDFSSink+AvroSourceSink)
128 1
|
6月前
|
消息中间件 存储 SQL
Flume【基础知识 01】简介 + 基本架构及核心概念 + 架构模式 + Agent内部原理 + 配置格式(一篇即可入门Flume)
【2月更文挑战第18天】Flume【基础知识 01】简介 + 基本架构及核心概念 + 架构模式 + Agent内部原理 + 配置格式(一篇即可入门Flume)
1832 0
|
6月前
|
存储 监控 Linux
Ganglia【部署 01】Flume监控工具Ganglia的安装与配置(CentOS 7.5 在线安装系统监控工具Ganglia + 权限问题处理 + Flume接入监控配置 + 图例说明)
Ganglia【部署 01】Flume监控工具Ganglia的安装与配置(CentOS 7.5 在线安装系统监控工具Ganglia + 权限问题处理 + Flume接入监控配置 + 图例说明)
113 0
|
消息中间件 存储 Java
flume的安装和配置
flume的安装和配置
|
分布式数据库 Hbase
Flume安装及配置
Flume 提供了大量内置的 Source、Channel 和 Sink 类型。而且不同类型的 Source、Channel 和 Sink 可以自由组合—–组合方式基于配置文件的设置,非常灵活。比如:Channel 可以把事件暂存在内存里,也可以持久化到本地硬盘上。Sink 可以把日志写入 HDFS、HBase,甚至是另外一个 Source 等。
|
存储 数据采集 监控
电商项目之 Flume 安装配置参数讲解|学习笔记
快速学习电商项目之 Flume 安装配置参数讲解
电商项目之 Flume 安装配置参数讲解|学习笔记
|
分布式计算 Hadoop 大数据
D009 复制粘贴玩大数据之安装与配置Flume集群
Flume的获取; 上传安装包到集群; 安装Flume集群; Flume集群校验
123 0
D009 复制粘贴玩大数据之安装与配置Flume集群