I'm working on some Xquery code (using SAXON) to execute a simple XQuery file against a large XML file.
The XML file (located at this.referenceDataPath) has 3 million "row" nodes and has the form:
<row>
<ISRC_NUMBER>1234567890</ISRC_NUMBER>
</row>
<row>
<ISRC_NUMBER>1234567891</ISRC_NUMBER>
</row>
<row>
<ISRC_NUMBER>1234567892</ISRC_NUMBER>
</row>
etc...
The XQuery document (located at this.xqueryPath) is :
declare variable $isrc as xs:string external;
declare variable $refDocument external;
let $isrcNode:=$refDocument//row[ISRC_NUMBER=$isrc]
return count($isrcNode)
The Java code is:
private XQItem referenceDataItem;
private XQPreparedExpression xPrepExec;
private XQConnection conn;
//set connection string and xquery file
this.conn = new SaxonXQDataSource().getConnection();
InputStream queryFromFile = new FileInputStream(this.xqueryPath);
//Set the prepared expression
InputStream is = new FileInputStream(this.referenceDataPath);
this.referenceDataItem = conn.createItemFromDocument(is, null, null);
this.xPrepExec = conn.prepareExpression(queryFromFile);
xPrepExec.bindItem(new QName("refDocument"), this.referenceDataItem);
//the code below is in a seperate method and called multiple times
public int getCount(String searchVal){
xPrepExec.bindString(new QName("isrc"), searchVal, conn.createAtomicType (XQItemType.XQBASETYPE_STRING));
XQSequence resultsFromFile = xPrepExec.executeQuery();
int count = Integer.parseInt(resultsFromFile.getSequenceAsString(new Properties()));
return count;
}
The method getCount is called many times in succession (eg 1000000 times) to validate the existance of many values in the XML file.
The current speed of the Xquery query is about 500 milliseconds for each call to getCount which seems very slow considering the XML document is in memory and the query is a prepared one.
The reason I'm using XQuery is as a proof of concept for future work where the XML file will have a more complex layout.
I'm running the code on an i7 with 8GB RAM so memory is not an issue - I also increased the allocated heap size for the program.
Any suggestions on how I can improve the speed of this code?
Thanks!