Goal - Using sax parser parse different xml files paralleley by multiple threads.
Found multiple posts related to the same topic. But none of them is pointing to the answer.
I know SAXParserFactory and SAXParser is not thread safe. As per my research I need to create new instances of SAXParserFactory and SAXParser for each thread. How can I achieve this. (Also new instance of MySAXHandler)
Please find the current implementation of my code.
Initiation of SAXParser
@Override
public GameStatisticsDTO processStatsGameStatXML(File gameStatsStatFile) {
try(InputStream inputStream = new FileInputStream(gameStatsStatFile)) {
// New Handler instance
GameStatsSAXHandler gameStatsSAXHandler = new GameStatsSAXHandler();
Reader reader = new InputStreamReader(inputStream, Constants.ENCODING_TYPE_UTF_8);
InputSource inputSource = new InputSource(reader);
inputSource.setEncoding(Constants.ENCODING_TYPE_UTF_8);
// New Instance of SAXParserFactory
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
// New Instance of SAXParser
SAXParser saxParser = factory.newSAXParser();
// Create an XML reader to set the entity resolver.
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setEntityResolver(new StatsCustomResolver());
xmlReader.setContentHandler(gameStatsSAXHandler);
xmlReader.parse(inputSource);
return gameStatsSAXHandler.getGameStatisticsDTO();
} catch (Exception e) {
throw new UnprocessableEntityException();
}
}
This will call the GameStatsSAXHandler to parse xml nodes. Within that class I'm maintaining Instance reference variables to store my parsed data.
public class GameStatsSAXHandler extends DefaultHandler {
// Instance Reference Variable - Hope this is thread safe
private GameStatisticsDTO gameStatisticsDTO = new GameStatisticsDTO();
protected GameStatisticsDTO getGameStatisticsDTO() {
return this.gameStatisticsDTO;
}
@Override
public void startElement (String uri, String localName, String
elementName, Attributes attributes) throws SAXException {
// Process the data and add it to the gameStatisticsDTO
}
@Override
public void endElement (String uri, String localName, String
elementName) throws SAXException {
// Do some processing in gameStatisticsDTO
}
}
gameStatisticsDTO contains multiple instance reference variables (Objects and Lists)
So I have 2 questions.
1) Since only local primitive variables are thread safe. Is this GameStatsSAXHandler and its GameStatisticsDTO are thread safe ?
My Thought: If I create new GameStatsSAXHandler instance for each thread then GameStatisticsDTO will be thread safe.
2) How can I convert this to multi threaded environment with parallelism.
My Thought: Create ThreadPoolExecutor and pass new SAXParserFactory and generate new SAXParser and create new GameStatsSAXHandler and pass it to the base method to processing. (processStatsGameStatXML method)
But how can I create new Instance for each thread? Code sample will be great ! Thanks