0
votes

ArangoDB crashes with two hop graph query.

enter image description here

Image three vertex collections VA, VB and VC. VA is connected to VB and VC. We have an instance of VB and want all VC that are connected to it via edges and VA instances.

All of them are definded in a graph to access them with the Arango graph API. I used the following AQL statement to query the graph. First I get all VAs that are connected to the VB instance and then all VCs that are connected to the VAs.

FOR va IN GRAPH_NEIGHBORS("Graph", "VB/Instance", {direction: "inbound", edgeCollectionRestriction: "eAB"})
FOR vc in GRAPH_NEIGHBORS("Graph", va._id, {direction: "outbound", edgeCollectionRestriction: "eBA"})
    RETURN vc

The result is that after a few minutes of computation the ArangoDB crashes without any useful information in the log files.

It seems very inefficient to model edges as documents and not links between documents, because in such a query the whole edge collection has be run through multiple times to find the correct links. I assume that two hops is just to much for the database to handle. Or is there any potential to improve the query and not crash the database?

1
How many results would you estimate to come out of the above query in total, and for the first level part only? And which ArangoDB version are you using? - stj
@stj first level (VA) around 100.000 and second (VC) not more than 10. We use the current version 2.5.5 - secana
A problem seems to be the massive amount of path data returned by default by GRAPH_NEIGHBORS. Can you try if the following query is any better: FOR va IN NEIGHBORS(VB, eAB, 'VB/Instance', 'inbound') FOR vc IN NEIGHBORS(VB, eBA, va.vertex._id, 'outbound') RETURN vc.vertex? Apart from that, there have been some changes in 2.6 to make GRAPH_NEIGHBORS and NEIGHBORS run a lot faster than in previous versions, which may also fix this issue. - stj
I tried your AQL query 4 times. Two times it work with a run time of 20 minutes, two times the database crashed with no useful error log. I'll try it again with version 2.6 as soon as it's available and post the result here. Thx for your help stj! - secana
@stj Hi it's me again. I tried the query a supposed by you with the 2.6 alpha and got a results in 5 seconds. That's much better than a crash after 20 minutes :) - secana

1 Answers

3
votes

In the 2.6 release a few days ago (June 2015) some changes have been made to make GRAPH_NEIGHBORS and NEIGHBORS faster.

As seen in the comments this seems to fix the problem.