I have to read from large xml files each ranging ~500MB. The batch processes typically 500 such files in each run. I have to extract text nodes from it and at the same time extract xml nodes from it. I used xpath DOM in java for easy of use but that doesn't work due to memory issues as i have limited resources.
I intent to use SAX or stax in java now - the text nodes can be easily extracted but i don't know how to extract xml nodes from xml using sax.
a sample:
<?xml version="1.0"?>
<Library>
<Book name = "ABC">
<Author>John</Author>
<PrintingCompanyDT><Printer>Sam</Printer><Printmachine>Laser</Printmachine>
<AssocPrint>Oreilly</AssocPrint> </PrintingCompanyDT>
</Book>
<Book name = "123">
<Author>Mason</Author>
<PrintingCompanyDTv<Printervkelly</Printer><Printmachine>DOTPrint</Printmachine>
<AssocPrint>Oxford</AssocPrint> </PrintingCompanyDT>
</Book>
</Library>
The expected result:
1)Book: ABC:
Author:John
PrintCompany Detail XML:
<PrintingCompanyDT>
<Printer>Sam</Printer>
<Printmachine>Laser</Printmachine>
<AssocPrint>Oreilly</AssocPrint>
</PrintingCompanyDT>
2) Book: 123
Author : Mason
PrintCompany Detail XML:
<PrintingCompanyDT>
<Printer>kelly</Printer>
<Printmachine>DOTPrint</Printmachine>
<AssocPrint>Oxford</AssocPrint>
</PrintingCompanyDT>
If i try in the regular way of appending characters in public void characters(char ch[], int start, int length) method
I get the below
1)Book: ABC:
Author:John
PrintCompany Detail XML :
Sam
Laser
Oreilly
exactly the content and spaces.
Can somebody suggest how to extract an xml node as it is from a xml file through SAX or StaX parser in java.