【Flume】HDFSSink配置参数说明-阿里云开发者社区

【Flume】HDFSSink配置参数说明

2017-11-08 1504

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

Name	Default	Description
channel	–
type	–	The component type name, needs to be hdfs
hdfs.path	–	HDFS directory path (eg hdfs://namenode/flume/webdata/)
hdfs.filePrefix	FlumeData	HDSFSink产生文件的前缀名称，如果定义为 MyFile，则生成文件将会是/hdfspath/MyFile.后缀
hdfs.fileSuffix	–	定义产生文件的后缀。例如要生成文本类型的.txt作为后缀
hdfs.inUsePrefix	–	写入HDFS过程中，会生成一个临时文件，inUsePrefix则定义了临时文件的前缀
hdfs.inUseSuffix	.tmp	定义写入HDFS过程中，文件的后缀
hdfs.rollInterval	30	Number of seconds to wait before rolling current file (0 = never roll based on time interval) 按时间生成HDFS文件
hdfs.rollSize	1024	File size to trigger roll, in bytes (0: never roll based on file size) 按文件大小触发roll，生成新的HDFS文件
hdfs.rollCount	10	Number of events written to file before it rolled (0 = never roll based on number of events) 按写入的event的个数触发roll，生成新的HDFS文件
hdfs.idleTimeout	0	Timeout after which inactive files get closed (0 = disable automatic closing of idle files) 当达到idleTimeout时，关闭空闲的文件（默认为0，不自动关闭空闲的文件）
hdfs.batchSize	100	number of events written to file before it is flushed to HDFS HDFSEventSink以batchSize的event个数作为一个事务处理，统一flush到HDFS中，提交事务
hdfs.codeC	–	Compression codec. one of following : gzip, bzip2, lzo, lzop, snappy 设置具体的压缩格式，在配置fileType为CompressedStream时，需要设置CodeC
hdfs.fileType	SequenceFile	File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC 在生成HDFSWriter的具体实现时，通过该参数制定HDFSWriter的实现类。三个格式分别对应HDFSSequenceFile，HDFSDataStream，HDFSCompressedDataStream
hdfs.maxOpenFiles	5000	Allow only this number of open files. If this number is exceeded, the oldest file is closed. HDFSSink中维护了与HDFS创建的文件的连接（每个文件对应一个BucketWriter），如果超过该值，HDFSSink将会关闭创建最久的BucketWriter
hdfs.minBlockReplicas	–	Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath. 设置HDFS块的副本数，如果没有特殊设置，默认采用Hadoop的配置
hdfs.writeFormat	Writable	Format for sequence file records. One of Text or Writable. Set to Text before creating data files with Flume, otherwise those files cannot be read by either Apache Impala (incubating) or Apache Hive.
hdfs.callTimeout	10000	Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring.
hdfs.threadsPoolSize	10	Number of threads per HDFS sink for HDFS IO ops (open, write, etc.) 处理HDFS相关操作的线程数
hdfs.rollTimerPoolSize	1	Number of threads per HDFS sink for scheduling timed file rolling 对HDFS文件进行roll操作的线程数
hdfs.kerberosPrincipal	–	Kerberos user principal for accessing secure HDFS
hdfs.kerberosKeytab	–	Kerberos keytab for accessing secure HDFS
hdfs.proxyUser
hdfs.round	false	Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)
hdfs.roundValue	1	Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.
hdfs.roundUnit	second	The unit of the round down value - second, minute or hour.
hdfs.timeZone	Local Time	Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles.
hdfs.useLocalTimeStamp	false	Use the local time (instead of the timestamp from the event header) while replacing the escape sequences.
hdfs.closeTries	0	Number of times the sink must try renaming a file, after initiating a close attempt. If set to 1, this sink will not re-try a failed rename (due to, for example, NameNode or DataNode failure), and may leave the file in an open state with a .tmp extension. If set to 0, the sink will try to rename the file until the file is eventually renamed (there is no limit on the number of times it would try). The file may still remain open if the close call fails but the data will be intact and in this case, the file will be closed only after a Flume restart.
hdfs.retryInterval	180	Time in seconds between consecutive attempts to close a file. Each close call costs multiple RPC round-trips to the Namenode, so setting this too low can cause a lot of load on the name node. If set to 0 or less, the sink will not attempt to close the file if the first attempt fails, and may leave the file open or with a ”.tmp” extension.
serializer	TEXT	Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface.
serializer.*

本文转自巧克力黒 51CTO博客，原文链接：http://blog.51cto.com/10120275/2052987，如需转载请自行联系原作者

【Flume】HDFSSink配置参数说明

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【Flume】HDFSSink配置参数说明

热门文章

最新文章

相关课程

相关电子书