4
votes

In the spring batch code, I am reading chunk of 100 records. for each record in the chunk I am checking whether the record exists in the DB or not. If the record exists in the DB I am not inserting it.
For the first time, if I have a duplicate record in the chunk of 100, the spring batch processor is not able to identify that this record is duplicate as there is no data in DB and the processor will select all 100 first and then perform the insert.

Is there a way I can perform a check within the chunk of 100 first and then check the DB and after these two I insert into the DB?

1

1 Answers

5
votes

You could implement your own custom ItemProcessor to check for duplicates and drop them.

Here's an example :

public class DropDuplicateItemProcessor<T> implements ItemProcessor<T, T> {

    // Will be used to save previous items;
    private List<T> previousItems = new ArrayList<T>();

    @Override
    public T process(T item) throws Exception {

         // Check for duplicates with your own logic (method equals)
         if (previousItems.contains(item))
             return null; // Drop duplicate

         // Save item to check for duplicates later
         previousItems.add(item);

         // Continue with non-duplicate item
         return item;
    }

}