Move whole file into HDFS as single file using flume spooling directory

Question

Aa far as flume documentation we can move data into HDFS based on event size or event count or duration . Is there any way to move whole file from spooling directory into HDFS as single file

Example 
Spooling Dir             HDFS
file1 - 1000 event ----> file1-1000 event
file2 - 1008 event ----> file2 - 1008 event
file3 - 800 event  ----> file3 - 800 event

Thanks.

Erik Schmiegelow Erik Schmiegelow · Accepted Answer · 2015-01-07T13:27:02

Well, sort of. You need to tweak you configuration to reflect that, because flume wasn't designed to shove entire files regardless of their size, as you can more effectively use hadoop fs -copyFromLocal to do that.

Here's a list of things you need to configure:

a) batch channel size must be smaller than the size of events in your files in case you only sporadically spool files. otherwise your events may stay stuck in your channels.

b) hdfs.rollSize = 0 to make sure your files don't get rolled over after any size limit

c) hdfs.rollCount = 0 to make sure your files don't get rolled over after any amount of events

d) hdfs.rollInterval set to a decent amount to make sure your files git spooled on time.

e) spool one file at the time to avoid mix ups.

that's basically it.

Move whole file into HDFS as single file using flume spooling directory

1 Answers