0
votes

I am using XQuery/BaseX to look through large XML files to find historical data for some counters. All the files are zipped and stored somewhere on drive. The important part of file looks as follows:

<measInfo xmlns="http://www.hehe.org/foo/" measInfoId="uplink speed">
  <granPeriod duration="DS222S" endTime="2020-09-03T08:15:00+02:00"/>
  <repPeriod duration="DS222S"/>
  <measTypes>AFD123 AFD124 AFD125 AFD156</measTypes>
  <measValue measObjLdn="PLDS-PLDS/STBHG-532632">
    <measResults>23 42 12 43</measResults>
  </measValue>
</measInfo>

I built the following query:

declare default element namespace "http://www.hehe.org/foo/";
let $sought := ["AFD124", "AFD125"]
let $datasource := collection("C:\Users\Patryk\Desktop\folderwitharchives")
let $filename := concat(convert:dateTime-to-integer(current-dateTime()), ".xml")

for $meas in $datasource/measCollecFile/measData/measInfo return 
  for $measType at $i in $meas/tokenize(measTypes)[. = $sought] return
    file:append($filename,
      <meas
        measInfoId="{data($meas/@measInfoId)}"
        measObjLdn="{data($meas/measValue/@measObjLdn)}"
      >
      
        {$meas/granPeriod}
        {$meas/repPeriod}
        <measType>{$measType}</measType>
        <measValue>{$meas/measValue/tokenize(measResults, " ")[$i]}</measValue>
      </meas>)

The script works, but it takes a lot of time for some counters (measType). I read the documentation about indexing, and my idea is to somehow index all the measTypes (parts of the string), so that once I need to look through the whole archive looking for a counter, it can be quickly accessed. I am not sure if it is possible when operating directly on archives? Would I have to create a new database of them? I would prefer not to, due to the size of files. How to create indexes for such case?

1
I don't think you can index the result of a tokenize call. A small improvement might result from simply using a sequence let $sought := (AFD124", "AFD125") instead of the array in let $sought := ["AFD124", "AFD125"] as that way the . = $sought doesn't have to flatten an array each time a comparison is done. I have not tested whether that performs better, however. - Martin Honnen
There is now also an article specifically about optimizations in the docs - amix

1 Answers

0
votes

It is not the answer to my question, but I have noticed that the execution time is much longer when I write XML nodes to a file. It is much faster to append any other string to a file:

concat($measInfo/@measInfoId, ",", $measInfo/measValue/@measObjLdn, ",", 
$measInfo/granPeriod, ",", $measInfo/repPeriod, ",", $measType, ",", 
$tokenizedValues[$i], "&#10;"))

Why is it and how to speed up writing XML nodes to a file?

Also, I have noticed that appending value to a file inside for loop is much longer, and I suspect that it is because the file has to be opened again in each iteration. Is there a way to keep the file open throughout the whole query?