Spring Batch process an encoded zipped file

Question

I’m investigating the use of spring batch to process records from an encoded zipped file. The records are variable length with nested variable length data fields encoded within them.

I’m new to Spring and Spring Batch, this is how I plan to structure the batch configuration.

The ItemReader would need to read a single record from the zipped (*.gz) file input stream into a POJO (byte array), the length of this record would be contained in the first two bytes of the stream.
The ItemProcessor will decode the byte array and store info in relevant attributes in the POJO.
The ItemWriter would populate a database.

My initial problem is understanding how to set up the ItemReader, I’ve looked at some of the examples of using a FlatFileItemReader, but my difficulty is the expectation to have a Line Mapper. I don't see how I can do that in my case (no concept of a line in the file).

There are some articles indicating the use of a custom BufferedReaderFactory, but great to see a worked example of this.

Help would be appreciated.

Sorry for the delay replying, my confusion was based around the file handling in the custom ItemReader, if I was to open and process the file in the read() method, I would have to keep track of where I was in the file etc. I managed to tackle this by creating a BufferedInputStream (BufferedInputStream(new GZIPInputStream(new FileInputStream(file)) in the constructor of the custom ItemReader, then process that stream in the read() method with each iteration of the step. — Hugh Lacey

Michael Pralow Michael Pralow · Accepted Answer · 2015-08-13T09:39:46

if the gzipped file is a simple txt file, you only need a custum BufferedReaderFactory, the linemaper then gets the String of the current line

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import java.util.zip.GZIPInputStream;
import org.springframework.batch.item.file.BufferedReaderFactory;
import org.springframework.core.io.Resource;

public class GZipBufferedReaderFactory implements BufferedReaderFactory {

    /** Default value for gzip suffixes. */
    private List<String> gzipSuffixes = new ArrayList<String>() {

        {
            add(".gz");
            add(".gzip");
        }
    };

    /**
     * Creates Bufferedreader for gzip Resource, handles normal resources
     * too.
     * 
     * @param resource
     * @param encoding
     * @return
     * @throws UnsupportedEncodingException
     * @throws IOException 
     */
    @Override
    public BufferedReader create(Resource resource, String encoding)
            throws UnsupportedEncodingException, IOException {
        for (String suffix : gzipSuffixes) {
            // test for filename and description, description is used when 
            // handling itemStreamResources
            if (resource.getFilename().endsWith(suffix)
                    || resource.getDescription().endsWith(suffix)) {
                return new BufferedReader(new InputStreamReader(new GZIPInputStream(resource.getInputStream()), encoding));
            }
        }
        return new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));
    }

    public List<String> getGzipSuffixes() {
        return gzipSuffixes;
    }

    public void setGzipSuffixes(List<String> gzipSuffixes) {
        this.gzipSuffixes = gzipSuffixes;
    }
}

simple itemreader configuration:

 <bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
  <property name="resource" value="#{jobParameters['input.file']}" />
  <property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.PassThroughLineMapper" />
  </property>
  <property name="strict" value="true" />
  <property name="bufferedReaderFactory">
    <bean class="your.custom.GZipBufferedReaderFactory" />
  </property>
</bean>

Spring Batch process an encoded zipped file

4 Answers

Edit2: this only works for input Resources