BaseX Xquery optimization

Question

Hello I work with BaseX in C++ and have a problem with the performance of my querys. I have got a Database with manny Xml files but there is for example one xml file that is imported from an csv file that looks like that.

<record>
  <hsn>0005</hsn>
  <tsn>486</tsn>
  <factorycode>BMW 3/1</factorycode>
  <description>318I</description>
  <power>83</power>
  <cubiccapacity>1796</cubiccapacity>
  <typeapprovaldate>19910701</typeapprovaldate>
  <xxx>1</xxx>
  <mid>BMW00737</mid>
</record>

I have a simple Query statement that looks for every mid with the same hsn and tsn

for $mid in doc('database')//record
where $mid / hsn = '0005' and $mid / tsn = '404'
return $mid/mid

The problem is that it takes to long because the xml file contains to many records.

Is there a way to optimize my query or the xml file? I think Attribute Index could work but I dont know how to use it in my database http://docs.basex.org/wiki/Indexes

Did you check the output of the Info View, as described in the Wiki article you quoted? — Christian Grün

dirkk dirkk · Accepted Answer · 2017-09-26T20:29:59

First of, what do you mean by "to long" and how many records do you keep? Too long could mean multiple seconds or minutes or it could mean 50ms as it is too long for your use case. Please be more specific when asking questions.

Next, you will certainly never use the attribute index because, well, you don't have any attributes in your xml. You want to use the Text index. Normally, your query should be rewritten by the optimizer to use the text index in this case, but you can make sure but taking a look at the "Query Info" view in the BaseX GUI. In the compiling steps and the resulting optimizing queries you should see entries that the index is used. If you don't see anything, the index is not used because for some reason the optimizer decided not to or maybe your index is not up to date. You could use db:text directly.

However, let me give you two unrelated hints: First of, if performance is a concern for you never use //. It is a descendant-or-self step and it means BaseX has to look at all descendant elements. Instead, use the specific path, i.e. doc('database')/records/record.

Additionally, do not write $mid / hsn. While it might be valid it is highly unconventional to put white spaces between the path operator. Instead just drop the whitespaces and write $mid/hsn.

BaseX Xquery optimization

1 Answers