0
votes

I have i have flume agent as follows

agent1.sources = Weather
agent1.sources.Weather.type = spooldir
agent1.sources.Weather.spoolDir = /Weather/Docs
agent1.sources.Weather.channels = MemChannel
agent1.channels = MemChannel
agent1.channels.MemChannel.type = memory
agent1.channels.MemChannel.capacity = 10000
agent1.channels.MemChannel.transactionCapacity = 1000
agent1.channels.MemChannel.deletePolicy = immediate
agent1.sinks = HDFS
agent1.sinks.HDFS.channel = MemChannel
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/input/
agent1.sinks.HDFS.hdfs.fileType = DataStream
agent1.sinks.HDFS.hdfs.writeFormat = Text
agent1.sinks.HDFS.hdfs.batchSize = 1000
agent1.sinks.HDFS.hdfs.rollSize = 0
agent1.sinks.HDFS.hdfs.rollCount = 10000

the files in the spool directory are renaming automatically to .COMPLETED the files should rename to .COMPLETED after flume agent writes that file to HDFS but here in my case its renaming files to .COMPLETED before the agent runs. it is also renaming files to .COMPLETED even if i just copy files manually to spooling directory.

One more problem is deletepolicy not deleting files even after file is copied to HDFS.

the agent writing spooling directory files randomly to HDFS.

it is also creating lots of tmp files in HDFS.

Am i doing something wrong in writing agent or did i missed anything in agent..??

pLease help me to resolve this.

Thanks in Advance

1

1 Answers

0
votes

I suspect that Flume is behaving as designed and you might be confused about the order of things. You could have a look in the logs to prove this.

Flume is a queue. The spooldir source will read the lines from the file and put them into the queue and then delete or rename the file. There is nothing in that sequence of operations that waits for the events to be written to the sink.

As I say, you could block HDFS and you'll see this behaviour in the log. The queueSize on MemChannel will increase until you turn HDFS back on.