0
votes

I am using spring boot(2.2.1) along with spring data jpa.

My application has a scheduled service running where I have to read millions of customer data(with pagination) for multiple companies. And after doing some operation I have to update a status column for those users. To update status I am using native query (using @Query annotation and nativeQuery=true).

public void scheduledTask() {
  List<Integer> companies = getCompanies();
  for each company:
      1. get x customers
      2. do some operation

    3. repeat step 1 -> 2 for all customers in a company 
       and then update the read status for the customers
}

After processing some customer records the read status is not getting updated for the processed records if there's any exception. Also, if there are few million customer records processed hibernate entity manager gets closed.

In the above process execution, read status gets updated only after processing of all customers of all companies.

Now, I want to know if there's any efficient way to load and update the customer's data so that in case of exception my read status update does not get lost.

1
You don't read a million customers' records in a single call. Also, your pseudocode does not show what you have, you can remove core business logic, but without seeing your code, making assumptions will not help anyone. Also, how are you scheduling your method?ImtiazeA

1 Answers

0
votes

The best solution for this heavily depends on what "do some operation" means and what reasons for exceptions you have to deal with. Since we don't know this, I'll stick to some general advice.

  1. In order to make your changes not to get rolled back put them in separate transactions.

  2. Avoid tons of small transactions. Every transaction forces a database to make some I/O which costs performance. To large transactions might also have some problems.

  3. Avoid JPA for this kind of work. JPAs strength are CRUD operations, where you load a single entity or maybe a few, change them and flush the changes out to the database. For massive batch operations as we have here stick to JDBC and SQL. It carries way less overhead.

  4. Look into special tools for batch operations like Spring Batch.

  5. Regarding exceptions: try to avoid them. Or at least avoid them travelling across your transaction boundaries. You might look into retry strategies where you first try a batch of customers and if the batch throws an exception, process them one by one, so only the one causing the exception doesn't make it through the process.