I have a large bipartite graph (300M+ nodes) stored in ArangoDB using two collections and an edgelist. I'm trying to do an efficient traversal using AQL that starts from a node of one type with a particular label to find all other connected nodes of the same type with the same label. The resulting traversal could find anywhere between 2 and 150K nodes, though on average it will be around 10-20 nodes. It is important that a) I specify a large default max traversal depth (ie. 0..50) to ensure I find everything, but that b) AQL prunes paths so that most of the time it never reaches this max depth.
I have a query that gets the right results, but it does not appear to prune the paths, as it gets slower as I increase the max depth, even though the results do not change.
Here is the problem in miniature (picture here):
var cir = db._create("circles");
var dia = db._create("diamonds");
var owns = db._createEdgeCollection("owns");
var A = cir.save({_key: "A", color:'blue'});
var B = cir.save({_key: "B", color:'blue'});
var C = cir.save({_key: "C", color:'blue'});
var D = cir.save({_key: "D", color:'yellow'});
var E = cir.save({_key: "E", color:'yellow'});
var F = cir.save({_key: "F", color:'yellow'});
var G = cir.save({_key: "G", color:'red'});
var H = cir.save({_key: "H", color:'red'});
var d1 = dia.save({_key: "1"})_id;
var d2 = dia.save({_key: "2"})_id;
var d3 = dia.save({_key: "3"})_id;
var d4 = dia.save({_key: "4"})_id;
var d5 = dia.save({_key: "5"})_id;
var d6 = dia.save({_key: "6"})_id;
owns.save(A, d2, {});
owns.save(A, d5, {});
owns.save(A, d4, {});
owns.save(B, d4, {});
owns.save(C, d5, {});
owns.save(C, d6, {});
owns.save(D, d1, {});
owns.save(D, d2, {});
owns.save(E, d1, {});
owns.save(E, d3, {});
owns.save(F, d3, {});
owns.save(F, d4, {});
owns.save(G, d6, {});
owns.save(H, d6, {});
owns.save(H, d2, {});
Starting at the Node circle/A
I want to find all connected vertices only stopping when I encounter a circle which is not blue
.
The following AQL does what I want:
FOR v, e, p IN 0..5 ANY "circles/A" owns
FILTER p.vertices[* filter has(CURRENT, 'color')].color ALL == 'blue'
return v._id
But the FILTER clause does not cause any pruning to occur. At least, as I said above, in the large database I have, increasing the max depth makes it very slow, without changing the results.
So how do I ensure that the filtering of the paths causes the algorithm to prune the paths? The docs are a little thin on this. I can only find examples where exact path lengths are used (p.vertices[1]
for example).