arangodb aql effectively tarversing from startvertex through the endvertex and find connection between them

Question

i'm very new to graph concept and arangodb. i plan to using both of them in a project which related to communication analysis. i have set the data to fit the need in arangodb with one document collection named object and one edge collection named object_routing

in my object the data structure is as follow

{
  "img": "assets/img/default_message.png",
  "label": "some label",
  "obj_id": "45a92a7344ee4f758841b5466c010ed9",
  "type": "message"
}
...
{
  "img": "assets/img/default_person.png",
  "label": "some label",
  "obj_id": "45a92a7344ee4f758841b5466c01111",
  "type": "user"
}

in my object_routing the data structure is as follow

{
  "message_id": "no_data",
  "source": "45a92a7344ee4f758841b5466c010ed9",
  "target": "45a92a7344ee4f758841b5466c01111",
  "type": "has_contacted"
}

with _from : object/45a92a7344ee4f758841b5466c010ed9 and _to : object/45a92a7344ee4f758841b5466c01111

the sum of data for object is 23k and for object_routing is 127k.

my question is, how can i effectively traversing from start vertex through the end vertex, so that i can presumably get all the connected vertex and its edge and its children and so on between them untill there is nothing to traverse again?

i'm afraid my question is not clear enough and my understanding of graph concept is not in the right direction so please bear with me

note : bfs algorithm is not an option because that is not what i need. if possible, i would like to get the longest path. my arangodb current version is 3.1.7 running on a cluster with 1 coordinator and 3 db servers

What is the outcome you want from the query traversal? Do you want to find all possible edges between two vertices, or all possible vertices? Have a look at the traversal page on the ArangoDB web site and see if that helps. The command will give you an array of vertex documents and an array of edge's. You can still filter that result with something like FILTER LAST(vertex).obj_id == 'something' where the LAST value of the array of vertices will be the originating vertex for an INBOUND traversal path. What are you needing? — David Thomas
hi @DavidThomas sorry for the late reply, i think it is "find all possible edges between two vertices", with that i can also have all possible connected vertices between the given two vertices, no? the output i want is all connected vertices and edges between two given node, so that i can draw graph with the result and hopefully tell end user that if the two given vertices is somewhat connected in any way — Guntur Santoso
but i am unsure how to accomplish this, is this can be achieved by implementing some algorithm? or just by using graph traversal only is enough in arangodb? once again, any pointer would be very helpful — Guntur Santoso

David Thomas David Thomas · Accepted Answer · 2017-04-10T05:15:06

It is worth trying a few queries to get a feel for how AQL traversals work, but maybe start with this example from the AQL Traversal documentation page:

FOR v, e, p IN 1..10 OUTBOUND 'object/45a92a7344ee4f758841b5466c010ed9' GRAPH 'insert_my_graph_name'
  LET last_vertex_in_path = LAST(p.vertices)
  FILTER last_vertex_in_path.obj_id == '45a92a7344ee4f758841b5466c01111'
  RETURN p

This sample query will look at all outbound edges in your graph called insert_my_graph_name starting from the vertex with an _id of object/45a92a7344ee4f758841b5466c010ed9.

The query is then set up to return three variables for every path found:

v contains a collection of vertices for the outbound path found
e contains a collection of edges for the outbound path found
p contains the path that was found

A path is consisted of vertices connected to each other by edges.

If you want to explore the variables, try this version of the query:

FOR v, e, p IN 1..10 OUTBOUND 'object/45a92a7344ee4f758841b5466c010ed9' GRAPH 'insert_my_graph_name'
  RETURN {
    vertices: v,
    edges: e,
    paths: p
  }

What is nice is that AQL returns this information in JSON format, in arrays and such.

When a path is returned, it is stored as a document with two attributes, edges and vertices, where the edges attribute is an array of edge documents the path went down, and the vertices attribute is an array of vertex documents.

The interesting thing about the vertices array is that the order of array elements is important. The first document in the vertices array is the starting vertex, and the last document is the ending vertex.

So the example query above, because your query is set up as an OUTBOUND query, that means your starting vertex will always be the FIRST element of the array stored at p.vertices' and the end of the path will always be theLAST` element of that array.

It doesn't matter how many vertices are traversed in your path, that rule still works.

If your query was an INBOUND rule, then the logic stays the same, in that case FIRST(p.vertices) will be the starting vertex for the path, and LAST(p.vertices) will be the terminating vertex, which will be the same _id as what you specified in your query.

So back to your use case.. if you want to filter out all OUTBOUND paths from your starting vertex to a specific vertex, then you can add the LET last_vertex_in_path = LAST(p.vertices) declaration to set a reference to the last vertex in the path provided.

Then you can easily provide a FILTER that references this variable, and then filter on any attribute of that terminating vertex. You could filter on the last_vertex_in_path._id or last_vertex_in_path.obj_id or any other parameter of that final vertex document.

Play with it and practice some, but once you see that a graph traversal query only provides you with these three key variables, v, e, and p, and these aren't anything special, they are just arrays of vertices and edges, then you can do some pretty powerful filtering.

You could put filters on properties of any of the vertices, edges, or path positions to do some pretty flexible filtering and aggregation of the results it sends through.

Also have a look at the traversal options, they can be useful.

To get started just make sure your have your documents and edges loaded, and that you've created a graph with those document and edges collections in it.

And yes.. you can have many document and edge collections in a single graph, even sharing document/edge collections over multiple graphs if that suits your use cases.

Have fun!

arangodb aql effectively tarversing from startvertex through the endvertex and find connection between them

1 Answers