What is the Best performance architecture to read XML in Spring Batch? Each XML is approximately 300 KB size and we are processing 1 Million.
Our Current Approach
30 partitions and 30 Grids and Each slave gets 166 XMLS
Commit Chunk 100
Application Start Memory is 8 GB
Using JAXB in Reader Default Bean Scope
@StepScope
@Qualifier("xmlItemReader")
public IteratorItemReader<BaseDTO> xmlItemReader(
@Value("#{stepExecutionContext['fileName']}") List<String> fileNameList) throws Exception {
String readingFile = "File Not Found";
logger.info("----StaxEventItemReader----fileName--->" + fileNameList.toString());
List<BaseDTO> fileList = new ArrayList<BaseDTO>();
for (String filePath : fileNameList) {
try {
readingFile = filePath.trim();
Invoice bill = (Invoice) getUnMarshaller().unmarshal(new File(filePath));
UnifiedInvoiceDTO unifiedDTO = new UnifiedInvoiceDTO(bill, environment);
unifiedDTO.setFileName(filePath);
BaseDTO baseDTO = new BaseDTO();
baseDTO.setUnifiedDTO(unifiedDTO);
fileList.add(baseDTO);
} catch (Exception e) {
UnifiedInvoiceDTO unifiedDTO = new UnifiedInvoiceDTO();
unifiedDTO.setFileName(readingFile);
unifiedDTO.setErrorMessage(e);
BaseDTO baseDTO = new BaseDTO();
baseDTO.setUnifiedDTO(unifiedDTO);
fileList.add(baseDTO);
}
}
return new IteratorItemReader<>(fileList);
}
Our questions:
- Is this Archirecture correct
- Is any performance or architecture advantage of using StaxEventItemReader and XStreamMarshaller over JAXB.
- How to handle memory properly to avoid slow down