0
votes

I have a Spring batch job which contains only one step that reads a CSV file(containing approximately 2000 rows) using a FlatFileItemReader and writes the objects to the database. I have my own custom BeanWrapperFieldSetMapper which maps the rows to objects. The chunk size is set to 50, so I am expecting after writing the objects from each batch (50 objects), the heap memory to be released of those objects.

Since I'm leveraging the batch processing, I'm expecting at each given time to only have 50 CreditCardDebt objects. But instead, while processing the last batch I find the heap memory to contain 2000 CreditCardDebt objects.

What am I missing?

My BeanWrapperFieldSetMapper implementation:

@Component("CREDIT_CARD_DEBT_FIELD_SET_MAPPER_TEST")
public class TestDebtFieldSetMapper extends BeanWrapperFieldSetMapper<CreditCardDebt> {

    public TestDebtFieldSetMapper() {
        super.setPrototypeBeanName("CREDIT_CARD_DEBT_FIELD_SET_MAPPER_TEST");
    }

    @NonNull
    @Override
    public CreditCardDebt mapFieldSet(FieldSet fieldSet) {
        CreditCardDebt creditCardDebt = new CreditCardDebt();
        creditCardDebt.setAccount(fieldSet.readString(0));
        creditCardDebt.setCardholderId(fieldSet.readString(1));
        creditCardDebt.setDueDate(convertToLocalDateViaInstant(fieldSet.readString(2)));
        creditCardDebt.setDaysPastDue(fieldSet.readInt(3));
        creditCardDebt.setOverdueAmount(fieldSet.readDouble(4));
        creditCardDebt.setDirectDebitMinimumPayment(fieldSet.readDouble(5));
        creditCardDebt.setDirectDebitBalance(fieldSet.readDouble(6));
        creditCardDebt.setDirectDebitStatus(fieldSet.readChar(7));
        creditCardDebt.setDirectDebitType(DirectDebitType.valueOf(fieldSet.readString(8)));
        creditCardDebt.setCreatedDate(LocalDateTime.now());
        creditCardDebt.setFileName("BAL");
        return creditCardDebt;
    }

    private LocalDate convertToLocalDateViaInstant(String dateToConvert) {
        DateTimeFormatter formatters = DateTimeFormatter.ofPattern("yyyyMMdd");
        return LocalDate.parse(dateToConvert, formatters);
    }

1
Whether or not the objects are released immediately, the garbage collector will do the work of actually freeing up memory. It will decide when to run, and what to leave in place vs. actually free up – depending on the runtime circumstances, sometimes the decision is to leave garbage in place.kaan
The problem is that if I test the job with a much larger input(140k rows) and I set the chunk size to 1000, limiting the heap memory size to 64MB in order to force the GC to take action, after processing the first 3 batches, I get a java.lang.OutOfMemoryError: GC overhead limit exceeded. So it looks like, the instances belonging to a batch are not cleaned after the job is done with that batch. Is that the expected behavior?Andrei
Items of processed chunks should be garbage collected (I added an answer with more details). Are you sure your custom BeanWrapperFieldSetMapper (or something else) does not hold items during the whole the job execution?Mahmoud Ben Hassine
I have added the implementation of my BeanWrapperFieldSetMapper in the question description.Andrei
Hi, I have the same problem, do you manage to find an answer to this ?Jessy

1 Answers

0
votes

This is left to the garbage collector. The relevant code section related to this question is in the ChunkOrientedTasklet. In the most basic form of ChunkOrientedTasklet, there are two calls:

Chunk<I> inputs = chunkProvider.provide(contribution);
chunkProcessor.process(contribution, inputs);

The ChunkProvider uses the ItemReader to read commit-interval items (or less if the item reader returns null). And the ChunkProcessor uses ItemProcessor and ItemWriter to process and write items:

Chunk<O> outputs = transform(contribution, inputs);
write(contribution, inputs); // details of adjustments of output omitted here

This process is run repeatedly until the datasource is exhausted. So items of processed chunks should be garbage collected when the GC kicks in (since variables inputs/outputs are re-used) unless something is holding them in memory during the whole job execution.