Name |
Default |
Description |
channel |
– |
|
type |
– |
The component type name, needs to be hdfs |
hdfs.path |
– |
HDFS directory path (eg hdfs://namenode/flume/webdata/) |
hdfs.filePrefix |
FlumeData |
HDSFSink产生文件的前缀名称,如果定义为 MyFile,则生成文件将会是/hdfspath/MyFile.后缀 |
hdfs.fileSuffix |
– |
定义产生文件的后缀。例如要生成文本类型的.txt作为后缀 |
hdfs.inUsePrefix |
– |
写入HDFS过程中,会生成一个临时文件,inUsePrefix则定义了临时文件的前缀 |
hdfs.inUseSuffix |
.tmp |
定义写入HDFS过程中,文件的后缀 |
hdfs.rollInterval |
30 |
Number of seconds to wait before rolling current file (0 = never roll based on time interval) 按时间生成HDFS文件 |
hdfs.rollSize |
1024 |
File size to trigger roll, in bytes (0: never roll based on file size) 按文件大小触发roll,生成新的HDFS文件 |
hdfs.rollCount |
10 |
Number of events written to file before it rolled (0 = never roll based on number of events) 按写入的event的个数触发roll,生成新的HDFS文件 |
hdfs.idleTimeout |
0 |
Timeout after which inactive files get closed (0 = disable automatic closing of idle files) 当达到idleTimeout时,关闭空闲的文件(默认为0,不自动关闭空闲的文件) |
hdfs.batchSize |
100 |
number of events written to file before it is flushed to HDFS HDFSEventSink以batchSize的event个数作为一个事务处理,统一flush到HDFS中,提交事务 |
hdfs.codeC |
– |
Compression codec. one of following : gzip, bzip2, lzo, lzop, snappy 设置具体的压缩格式,在配置fileType为CompressedStream时,需要设置CodeC |
hdfs.fileType |
SequenceFile |
File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC 在生成HDFSWriter的具体实现时,通过该参数制定HDFSWriter的实现类。 三个格式分别对应HDFSSequenceFile,HDFSDataStream,HDFSCompressedDataStream |
hdfs.maxOpenFiles |
5000 |
Allow only this number of open files. If this number is exceeded, the oldest file is closed. HDFSSink中维护了与HDFS创建的文件的连接(每个文件对应一个BucketWriter), 如果超过该值,HDFSSink将会关闭创建最久的BucketWriter |
hdfs.minBlockReplicas |
– |
Specify minimum number of replicas per HDFS block. If not specified, it comes from the default Hadoop config in the classpath. 设置HDFS块的副本数,如果没有特殊设置,默认采用Hadoop的配置 |
hdfs.writeFormat |
Writable |
Format for sequence file records. One of Text or Writable. Set to Text before creating data files with Flume, otherwise those files cannot be read by either Apache Impala (incubating) or Apache Hive. |
hdfs.callTimeout |
10000 |
Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring. |
hdfs.threadsPoolSize |
10 |
Number of threads per HDFS sink for HDFS IO ops (open, write, etc.) 处理HDFS相关操作的线程数 |
hdfs.rollTimerPoolSize |
1 |
Number of threads per HDFS sink for scheduling timed file rolling 对HDFS文件进行roll操作的线程数 |
hdfs.kerberosPrincipal |
– |
Kerberos user principal for accessing secure HDFS |
hdfs.kerberosKeytab |
– |
Kerberos keytab for accessing secure HDFS |
hdfs.proxyUser |
|
|
hdfs.round |
false |
Should the timestamp be rounded down (if true, affects all time based escape sequences except %t) |
hdfs.roundValue |
1 |
Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time. |
hdfs.roundUnit |
second |
The unit of the round down value - second, minute or hour. |
hdfs.timeZone |
Local Time |
Name of the timezone that should be used for resolving the directory path, e.g. America/Los_Angeles. |
hdfs.useLocalTimeStamp |
false |
Use the local time (instead of the timestamp from the event header) while replacing the escape sequences. |
hdfs.closeTries |
0 |
Number of times the sink must try renaming a file, after initiating a close attempt. If set to 1, this sink will not re-try a failed rename (due to, for example, NameNode or DataNode failure), and may leave the file in an open state with a .tmp extension. If set to 0, the sink will try to rename the file until the file is eventually renamed (there is no limit on the number of times it would try). The file may still remain open if the close call fails but the data will be intact and in this case, the file will be closed only after a Flume restart. |
hdfs.retryInterval |
180 |
Time in seconds between consecutive attempts to close a file. Each close call costs multiple RPC round-trips to the Namenode, so setting this too low can cause a lot of load on the name node. If set to 0 or less, the sink will not attempt to close the file if the first attempt fails, and may leave the file open or with a ”.tmp” extension. |
serializer |
TEXT |
Other possible options include avro_event or the fully-qualified class name of an implementation of the EventSerializer.Builder interface. |
serializer.* |
|
|