2
votes

I'm trying to run flume with an hdfs sink. The hdfs is running in a different machine properly and I can even interact with the hdfs from the flume machine, but when I run flume and send events to it I get the following error:

2013-05-26 14:22:11,399 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:456)] HDFS IO error
java.io.IOException: Callable timed out after 25000 ms
    at org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:352)
    at org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:727)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:430)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:679)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
    at java.util.concurrent.FutureTask.get(FutureTask.java:119)
    at org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:345)
    ... 5 more

Again, conectivity is not an issue since I can interact with hdfs using the hadoop command line (the flume machine is NOT a datanode). The weirdest part is that after killing flume I can see that the tmp file is created in hdfs but it's empty (and the .tmp extension remains).

Any ideas as to why could this be happening? Thanks a lot!

1

1 Answers

2
votes

Check 3 things, if your firewall is off i.e. iptables should be stopped. Secondly, value of the property agent.sinks.hdfs-sink.hdfs.path = hdfs://PUBLIC_IP:8020/user/hdfs/flume and not Private IP. And change agent.sinks.hdfs-sink.hdfs.callTimeout = 180000 because the default is 10000 ms which is very less time for HDFS to react.

Thanks, Shilpa