I have several Spring Batch (2.1.9.RELEASE) jobs running in production that use org.springframework.batch.core.launch.support.RunIdIncrementer.
Sporadically, I get the following error:
org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={run.id=23, tenant.code=XXX}. If you want to run this job again, change the parameters.
at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:122) ~[spring-batch-core-2.1.9.RELEASE.jar:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.6.0_39]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) ~[na:1.6.0_39]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_39]
at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_39]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) ~[spring-aop-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) ~[spring-aop-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) ~[spring-aop-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110) ~[spring-tx-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at org.springframework.batch.core.repository.support.AbstractJobRepositoryFactoryBean$1.invoke(AbstractJobRepositoryFactoryBean.java:168) ~[spring-batch-core-2.1.9.RELEASE.jar:na]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) ~[spring-aop-3.1.1.RELEASE.jar:3.1.1.RELEASE]
at sun.proxy.$Proxy64.createJobExecution(Unknown Source) ~[na:na]
at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:111) ~[spring-batch-core-2.1.9.RELEASE.jar:na]
at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:349) [spring-batch-core-2.1.9.RELEASE.jar:na]
at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:574) [spring-batch-core-2.1.9.RELEASE.jar:na]
at (omitted for brevity)
A sampling from the various XML contexts:
<bean
id="jobParametersIncrementer"
class="org.springframework.batch.core.launch.support.RunIdIncrementer" />
<batch:job id="rootJob"
abstract="true"
restartable="true">
<batch:validator>
<bean class="org.springframework.batch.core.job.DefaultJobParametersValidator">
<property name="requiredKeys" value="tenant.code"/>
</bean>
</batch:validator>
</batch:job>
<batch:job id="rootJobWithIncrementer"
abstract="true"
parent="rootJob"
incrementer="jobParametersIncrementer" />
I use org.springframework.batch.core.launch.support.CommandLineJobRunner to execute the job:
java org.springframework.batch.core.launch.support.CommandLineJobRunner /com/XXX/job123/job123-context.xml job123 tenant.code=XXX -next
All of the jobs (that use the incrementer) have rootJobWithIncrementer as parent.
I did quite a bit of research and found that some who got this error had success changing the isolation level of the transaction manager. I fiddled with several levels, finally arriving at READ_COMMITED.
<batch:job-repository
id="jobRepository"
data-source="oracle_hmp"
transaction-manager="dataSourceTransactionManager"
isolation-level-for-create="READ_COMMITTED"/>
Based on my understanding, this type of error should only happen if the same job is executed at the same time from multiple processes -- so that there might be contention for the incrementer. In this instance, that is not the case, yet we see the error.
Any ideas as to what might be causing this problem? Should I try a different isolation level? Something else?
Thanks!
There is a similar question here, but it is not as well documented (and also lacks and answer).