3
votes

I'm new to ArangoDB, having trouble optimizing my queries and am hoping for some help.

The query I've provided below is a real example that I'm struggling with, 758.078 ms on my dev database but on staging, with a much larger dataset, it takes 531.511 s.

I'll also provide the size of each of the edge tables I'm traversing in dev and staging. Any help is really appreciated.

for doc in document
filter repo._key == "my-key"
    for v, e, p in 3 any doc edge1, edge2, edge3
    options {uniqueVertices: 'global', bfs: true}
    filter DATE_ISO8601(p.vertices[2].date) > DATE_ISO8601("2017-09-04T00:00:01Z")
        and DATE_ISO8601(p.vertices[2].date) < DATE_ISO8601("2017-09-15T23:59:59Z")
    limit 1
    return {
        commit: p.vertices[2].hash,
        date: p.vertices[2].date,
        message: p.vertices[2].message,
        author: p.vertices[1].email,
        loc: p.vertices[3].stats.additions
    }

DEV

  • edge1: 2,638
  • edge2: 2,560
  • edge3: 386

STAGING

  • edge1: 5,438,811
  • edge2: 5,544,028
  • edge3: 423,545
2

2 Answers

2
votes

The query is potentially slow because the filter condition

filter 
  DATE_ISO8601(p.vertices[2].date) > DATE_ISO8601("2017-09-04T00:00:01Z"
and 
  DATE_ISO8601(p.vertices[2].date) < DATE_ISO8601("2017-09-15T23:59:59Z")

is not applied during the traversal but only afterwards. Potentially this is due to the function calls (to DATE_ISO8601) in the filter conditions. If your date values are stored as numbers, can you check whether the following filter condition speeds up the query:

filter 
  p.vertices[2].date > DATE_TIMESTAMP("2017-09-04T00:00:01Z"
and 
  p.vertices[2].date < DATE_TIMESTAMP("2017-09-15T23:59:59Z")

That modified filter condition should allow pulling the filter condition inside the traversal, so it is executed earlier.

You can verify the query execution plans using db._explain(<query string goes here>); in the ArangoShell or from the web interface's AQL editor.

0
votes

Probably a bit late but it will help someone. Using the DATE function will make the query much slower, so if possible remove the DATE function. For example enter image description hereenter image description here

You can see that the query filter command uses a Date function that is ~ 7s. If you do not use the date function, it will run faster than ~ 0.5s. These two lines will query the data of 2018-09-29.