1
votes

I have a java application that gather data from different sources and write the output into files under a specific directory.

And I have a flume agent configured to use spooldir source to read from that directory and write the output to Solr using MorphlineSolrSink.

The flume agent throws the following exception

java.lang.IllegalStateException: File has changed size since being read

Here is the configuration of the flume agent

agent02.sources = s1
agent02.sinks = solrSink
agent02.channels = ch1

agent02.channels.ch1.type = file
agent02.channels.ch1.checkpointDir=/home/flume/prod_solr_chkpoint/file-channel/checkpoint
agent02.channels.ch1.dataDirs= /home/flume/prod_solr_chkpoint/file-channel/data

agent02.sources.s1.type = spooldir
agent02.sources.s1.channels = ch1

agent02.sources.s1.spoolDir = /DataCollection/json_output/solr/
agent02.sources.s1.deserializer.maxLineLength = 100000

agent02.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent02.sinks.solrSink.channel = ch1
agent02.sinks.solrSink.batchSize = 10000
agent02.sinks.solrSink.batchDurationMillis = 10000
agent02.sinks.solrSink.morphlineFile = morphlines.conf 
agent02.sinks.solrSink.morphlineId = morphline

What I understand from the exception is that the flume agent start working on a file while the java application did not finish writing it.

How can I fix this problem ?

Edit

I have no idea this information is valuable or not. These configurations were working before without any problem. We faced a hard desk failure in the machine we run flume from. After recovering from that failure flume throws this exception.

2
I'm not really sure, but guess flume only works with completed files, which means you should write to that file in a different folder, and then move it atomically to the folder where your agent is reading. - KBorja

2 Answers

2
votes

As stated in the documentation regarding the Spooling Directory Source:

In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these problem conditions and will fail loudly if they are violated:

  • If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing.
  • If a file name is reused at a later time, Flume will print an error to its log file and stop processing.

I'll suggest your Java application dumps buckets of data into temporal files; name them by adding the timestamp of creation. Once the bucket is full (i.e. a certain size is reached), then move the file to the spooling directory.

1
votes

write the source file to another directory, then move (mv command) the files to spool source directory. it should work. dont use copy command.