0
votes

I configured a spring batch to skip a bad record when there is an error reading the xml file. The skipPolicy implementation always return true in order to skip the bad record. The job need to continue processing the rest of the records, however in my case it stops after the bad record as completed.

@Configuration
@Import(DataSourceConfig.class)
@EnableWebMvc
@ComponentScan(basePackages = "org.nova.batch")
@EnableBatchProcessing
public class BatchIssueConfiguration {
private static final Logger LOG =LoggerFactory.getLogger(BatchIssueConfiguration.class);
    @Autowired
    private JobBuilderFactory jobBuilderFactory;
    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Bean(name = "jobRepository")
    public JobRepository jobRepository(DataSource dataSource, PlatformTransactionManager transactionManager) throws Exception {
        JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
        factory.setDatabaseType("derby");
        factory.setDataSource(dataSource);
        factory.setTransactionManager(transactionManager);
        return factory.getObject();
    }
    @Bean
    public Step stepSGR() throws IOException{
        return stepBuilderFactory.get("ETL_STEP").<SigmodRecord.Issue,SigmodRecord.Issue>chunk(1)
                //.processor(itemProcessor())
                .writer(itemWriter())
                .reader(multiReader())
                .faultTolerant()
                .skipLimit(Integer.MAX_VALUE)
                .skipPolicy(new FileVerificationSkipper())
                .skip(Throwable.class)
                .build();
    }

    @Bean
    public SkipPolicy   fileVerificationSkipper(){
        return new FileVerificationSkipper();
    }


    @Bean
    @JobScope
    public MultiResourceItemReader<SigmodRecord.Issue> multiReader() throws IOException{
        MultiResourceItemReader<SigmodRecord.Issue> mrir = new MultiResourceItemReader<SigmodRecord.Issue>();
        //FileSystemResource [] files = new FileSystemResource [{}];
        ResourcePatternResolver rpr = new PathMatchingResourcePatternResolver();
        Resource[] resources = rpr.getResources("file:c:/temp/Sigm*.xml");
        mrir.setResources( resources);
        mrir.setDelegate(xmlItemReader());
        return mrir;
    }
}

public class FileVerificationSkipper implements SkipPolicy {

    private static final Logger LOG = LoggerFactory.getLogger(FileVerificationSkipper.class);

    @Override
    public boolean shouldSkip(Throwable t, int skipCount) throws SkipLimitExceededException {
        LOG.error("There is an error {}",t);
        return true;
    }

}

The file has inputs which includes "&" that causes the reading error i.e.

<title>Notes of DDTS & n Apparatus for Experimental Research</title>

which throws the following error:

org.springframework.dao.DataAccessResourceFailureException: Error reading XML stream; nested exception is javax.xml.stream.XMLStreamException: ParseError at [row,col]:[127,25]
Message: The entity name must immediately follow the '&' in the entity reference.

Is there anything I'm doing wrong in my configuration that does not allow the rest of the records to continue processing.

2

2 Answers

0
votes

To skip for certain type of exceptions we can either mention the skip policy where we can write custom logic for skipping a exception. Like below code.

        @Bean
            public Step stepSGR() throws IOException{
                return stepBuilderFactory.get("ETL_STEP").<SigmodRecord.Issue,SigmodRecord.Issue>chunk(1)
                        //.processor(itemProcessor())
                        .writer(itemWriter())
                        .reader(multiReader())
                        .faultTolerant()
                        .skipPolicy(new FileVerificationSkipper())
                        .build();
            }

        public class FileVerificationSkipper implements SkipPolicy {

        private static final Logger LOG = LoggerFactory.getLogger(FileVerificationSkipper.class);

        @Override
        public boolean shouldSkip(Throwable t, int skipCount) throws SkipLimitExceededException {
            LOG.error("There is an error {}",t);
            if (t instanceof DataAccessResourceFailureException)          
              return true;
        }

    }

Or you can simply setup like below.

     @Bean
     public Step stepSGR() throws IOException{
       return stepBuilderFactory.get("ETL_STEP").<SigmodRecord.Issue,SigmodRecord.Issue>chunk(1)
                        //.processor(itemProcessor())
                        .writer(itemWriter())
                        .reader(multiReader())
                        .faultTolerant()
                        .skipLimit(Integer.MAX_VALUE)
                        .skip(DataAccessResourceFailureException.class)
                        .build();
            }
0
votes

This issue falls under malformed xml and it seems that there is no way to recover from that except fixing the xml itself. The spring StaxEventItemReader is using XMLEventReader in its low parse of the xml, so I tried to read the xml file using XMLEventReader to try and skip the bad block, however XMLEventReader.nextEvent() kept throwing an exception where the bad block is. I tried to handle that in try catch block in order to skip to next event but it seems that the reader wont move to the next event. So for now the only way to solve the issue is to fix the xml itself before processing it.