在解决了Flume-HDFS“丢数据”的问题后还听到抱怨说Flume还丢数据,如果说数据重复是可以理解的,我一直不理解为什么还丢呢?
今天同事发现在agent端日志里一段异常:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
20
Nov
2013
10
:
15
:
54
,
231
ERROR [pool-
10
-thread-
1
] (org.apache.flume.source.ExecSource$ExecRunnable.run:
347
) - Failed
while
running command: xxx.sh xxx.log
org.apache.flume.ChannelException: Unable to put batch on required channel: FileChannel channel_all { dataDirs: [xxx/data] }
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:
200
)
at org.apache.flume.source.ExecSource$ExecRunnable.flushEventBatch(ExecSource.java:
376
)
at org.apache.flume.source.ExecSource$ExecRunnable.run(ExecSource.java:
336
)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
471
)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:
334
)
at java.util.concurrent.FutureTask.run(FutureTask.java:
166
)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
1110
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
603
)
at java.lang.Thread.run(Thread.java:
722
)
Caused by: org.apache.flume.ChannelException: Failed to obtain lock
for
writing to the log. Try increasing the log write timeout value. [channel=xxx]
at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doPut(FileChannel.java:
478
)
at org.apache.flume.channel.BasicTransactionSemantics.put(BasicTransactionSemantics.java:
93
)
at org.apache.flume.channel.BasicChannelSemantics.put(BasicChannelSemantics.java:
80
)
at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:
189
)
...
8
more
|
之后没有报错,但是Flume没有数据流出了,同事想试试根据异常的提示增加writetimeout.
再把报错发给我后,去年关于Flume的记忆又清晰起来,心中一个声音:MD Flume的默认行为还没有更改!报错的提示还这么误导!
梳理一下流程:exec source在执行Tail的时候,put数据到channel,当一直未获取到channel lock超时后,exec source会退出,而默认行为是不自动恢复重试!因此,这个问题根本不在于timeout,而在于source在存消息失败后需要快速恢复,Flume提供快速恢复的机制,但是默认竟然是关闭的!
加上这三个参数:restartThrottle,restart,logStdErr解决
但是其实用tail的方式在类似的情况下还会有“极少量”数据的丢失,想要真做到不丢数据,推荐spooling directory的方式;
另外,今天研究了一下kafka的API,这里也有“丢数据”的坑,敬请期待Flume“丢数据”系列三.
本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1330210,如需转载请自行联系原作者