2
votes

I have a pretty big BaseX database (>2 Gb) containing a large number of XML documents. The XML files are pretty flat in nature. A simplified example of a typical xml file:

<document id="doc_id_1234">
    <value id="1">value 1</value>
    <value id="2">value 2</value>
    <value id="3">value 3</value>
</document>

My XQueries are largely based on attribute selectors (i.e. //value[@id='1' or @id='3']) and I have found that creating an Attribute Index in the database resulted in a massive query performance increase.

I upload new XML data on a monthly or quarterly basis. After importing the new XML files I re-create the Attribute Index again.

I have found however that after a reboot of the server (which seem to happen quite often at my service provider) the query speed significantly decreases. It feels like the performance drops to the state without the Attribute Index present. If I open the database using the BaseX GUI, it looks like the Attribute Index is still there. When I drop the existing Attribute Index and re-create it again, the performance of my XQueries is lightning fast again.

I am using BaseX version 7.7.1.

I would like to know:

  1. Where is the Attribute Index stored? Is it in RAM (which would explain why the query speed decreases after a reboot)?

  2. How can I configure my database in such a way that the XQuery performance remains consistently good?

Really hope you can help me out as this is a significant issue on my production website.

1
Have you ever found a solution to this question? - favq

1 Answers

1
votes

To answer your questions:

  1. The attribute index is at least materialized on hard disk inside your BaseXData folder (in which there's a folder for each database). It will usually reside in your home directory. The attribute indexes (names and values) are stored in the files following the pattern atv*.basex.
  2. Usually, the attribute index should survive restarts of both BaseX and your operating system. If you can somehow reproduce the index being invalidated without doing any updates to the database, you might want to post to BaseX' mailing list to make sure this isn't a bug. Maybe try the following steps in advance and make sure you're really not updating the database on startup.

You might want to try setting the UPINDEX option to true. This should rebuild the index when it is invalidated or not available. To make sure the index is used, run the query from basexclient -V.

Disclaimer: I'm somewhat affiliated with the BaseX-Team.