How to improve ArangoDB performance under the load (multiple concurrent queries)?

Question

We use ArangoDB and PostgreSQL to store almost identical data. PostgreSQL is used to perform general types of queries which relational databases can perform well. ArangoDB was selected to perform kind of queries like graph traversals, finding a shortest path, etc.

At the moment we have a table with 160000 records in PostgreSQL and a collection with the same amount of documents in ArangoDB.

The API we are working on will be used by multiple number of users at the same time, so first point I wanted to check was how both ArangoDB and PostgreSQL would perform under the load. I created a simple load test which as a workload performs a simple select query with the filter to both ArangoDB and PostgreSQL.

The query selects top N records/documents with the filter by date field.

When I run load test all the queries to PostgreSQL are executed within 0.5 second, I increase the amount of users from 10 to 100 and it does not affect execution time at all.

The same queries to ArangoDB are taking about 2 seconds when you start with a single user, then the response time grows in the direct ratio with the amount concurrent users. With 30 concurrent users all the queries would time out after waiting for 60 seconds for the reply.

I tried to debug arangojs connector and found this:

var maxTasks = typeof agent.maxSockets === 'number' ? agent.maxSockets * 2 : Infinity;

and this:

Connection.agentDefaults = {
  maxSockets: 3,
  keepAlive: true,
  keepAliveMsecs: 1000
};

which means that default arangojs behavior is to send not more than 6 concurrent queries to ArangoDB at the same time which leads to all the rest queries being queued on Node.js side. I tried to increase the number but it did not help and now it looks like all the queries are queued on the ArandoDB side. Now if I run the load and try to execute some query using ArangoDB Web Interface the query would hand for the unpredictable amount of time (depending on the amount of users at the moment) then return the result and would show me that it has been executed in about 4 seconds which is not true. For me it looks like ArangoDB can only execute one query a time while all the other queries are queued...

Am I missing something? Are there any setting to tune ArangoDB and improve it's performance under the load?

Update:

We use ArangoDB 3.0 and run it as a Docker container (from official image) with 1.5 GB of RAM.

Sample Document (we have about 16 000 of these):

{
  "type": "start",
  "from_date": "2016-07-28T10:22:16.000Z",
  "to_date": "9999-06-19T18:40:00.000Z",
  "comment": null,
  "id": "13_start",
  "version_id": 1
}

AQL Query:

FOR result IN @@collection 
   FILTER (result.version_id == 1) 
   FILTER (result.to_date > '2016-08-02T15:57:45.278Z') 
     SORT result._key 
     LIMIT 100 
     RETURN result

Could you please add your system specs, the ArangoDB version you are using and maybe an example document and the actual query? — CodeManX
using db._explain(yourQuery) do your queries use indices? Though you LIMIT the result set, SORT will result in all items having to be inspected, only the returned result is limited. Depending on the result size The amount of available RAM may be a bit low. Depending on the available CPU resources more server.threads may help to improve the speed. — dothebart
To understand a little better whats going on inside of ArangoDB we would need a little more information. The output of db._explain() for these queries would be a good start. We need to find out, which indices are utilized, and whether we could improve that situation. Most probably a combined skiplist index on _key and to_date can improve the situation. We should understand, in what ways the LIMIT cloakes side effects of errors above - if i.e. the sort has to be done manually lots of documents have to be handled in advance. — dothebart

CodeManX CodeManX · Accepted Answer · 2016-08-04T18:30:40

I created 160k sample documents with the following query:

LET v = [1,1,1,1,1,2,2,2,3,3,4]
LET d = DATE_NOW()

FOR i IN 1..160000
INSERT {
  "type": "start",
  "from_date": DATE_SUBTRACT(d, RAND()*4000, "days"),
  "to_date": DATE_ADD(d, RAND()*4000+100, "days"),
  "comment": null,
  "id": CONCAT(i, "_start"),
  "version_id": v[RAND()*LENGTH(v)]
} INTO @@collection
RETURN NEW

When synced to disk, the datafile is roughly 30MB. The journal files are 32MB.

If a run your query on that dataset, the reported execution time is 0.35 seconds on average.

I tried different indexes and a skiplist on just version_id seems to improve the performance best, bringing it down to 0.20 seconds at the cost of ~18MB memory for indexes. Right after a server restart, the query takes 1.5s, because the collection has to be loaded on first access and the indexes need to be rebuilt. Subsequent queries constantly take 0.2s however.

I used ArangoDB 3.0.devel, which should show more or less the same performance as the stable 3.0.x releases. The RAM used by the DBMS was ~440MB after running the query a couple times according to the web interface.

If you keep seeing query times >1.0s, something is not right. Can you check between queries, if the collection is automatically unloaded (possibly caused by insufficient RAM)? If so, check what eats your memory (if it's even ArangoDB), and make sure you try with more RAM to see if it influences the query time. Could another resource limit the performance, such as the mass storage or the CPU?

How to improve ArangoDB performance under the load (multiple concurrent queries)?

1 Answers