Flume java.lang.IllegalStateException: File has changed size since being read

Question

I have a java application that gather data from different sources and write the output into files under a specific directory.

And I have a flume agent configured to use spooldir source to read from that directory and write the output to Solr using MorphlineSolrSink.

The flume agent throws the following exception

java.lang.IllegalStateException: File has changed size since being read

Here is the configuration of the flume agent

agent02.sources = s1
agent02.sinks = solrSink
agent02.channels = ch1

agent02.channels.ch1.type = file
agent02.channels.ch1.checkpointDir=/home/flume/prod_solr_chkpoint/file-channel/checkpoint
agent02.channels.ch1.dataDirs= /home/flume/prod_solr_chkpoint/file-channel/data

agent02.sources.s1.type = spooldir
agent02.sources.s1.channels = ch1

agent02.sources.s1.spoolDir = /DataCollection/json_output/solr/
agent02.sources.s1.deserializer.maxLineLength = 100000

agent02.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent02.sinks.solrSink.channel = ch1
agent02.sinks.solrSink.batchSize = 10000
agent02.sinks.solrSink.batchDurationMillis = 10000
agent02.sinks.solrSink.morphlineFile = morphlines.conf 
agent02.sinks.solrSink.morphlineId = morphline

What I understand from the exception is that the flume agent start working on a file while the java application did not finish writing it.

How can I fix this problem ?

Edit

I have no idea this information is valuable or not. These configurations were working before without any problem. We faced a hard desk failure in the machine we run flume from. After recovering from that failure flume throws this exception.

I'm not really sure, but guess flume only works with completed files, which means you should write to that file in a different folder, and then move it atomically to the folder where your agent is reading. — KBorja

frb frb · Accepted Answer · 2015-04-27T13:46:52

As stated in the documentation regarding the Spooling Directory Source:

In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these problem conditions and will fail loudly if they are violated:

If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.

If a file name is reused at a later time, Flume will print an error to its log file and stop processing.

I'll suggest your Java application dumps buckets of data into temporal files; name them by adding the timestamp of creation. Once the bucket is full (i.e. a certain size is reached), then move the file to the spooling directory.

Flume java.lang.IllegalStateException: File has changed size since being read

Edit

2 Answers