1
votes

Data model is:

books - documents

pages - documents. Page may contain a only one references to another book

books_pages - edges. From book to page and from page to book

Example:

book1 -> (edge) -> page1 -> (edge) -> book2
book1 -> (edge) -> page2 -> (edge) -> book2
book1 -> (edge) -> page3 -> (edge) -> book2
book1 -> (edge) -> page4 -> (edge) -> book3
book2 -> (edge) -> page5 -> (edge) -> book4
book2 -> (edge) -> page6 -> (edge) -> book4
book2 -> (edge) -> page7 -> (edge) -> book4
book2 -> (edge) -> page6 -> (edge) -> book4
...

The goal is to build edges between books avoiding duplication. book1 contains several pages that mention book2, but I need only one edge. Id doesn't matter how many times book2 was referenced in book1.

AQL:

FOR b1 IN books
    FOR v IN 1..1 OUTBOUND b1 books_pages
       FOR b2 IN 1..1 OUTBOUND v books_pages
       COLLECT  from = b1._id, to = b2._id
  RETURN {'from':from, 'to': to}

When number of documents in a database is significant arangodb crashes. Is something wrong with this query or this is just a bug on arangodb side?

2

2 Answers

0
votes

I cannot comment on the crash, not least of all reasons because you don't give any information pertaining to it and how it manifests itself -- if the reason is an out-of-memory kill/restart, you should mention that (check the system logs if the arangodb log is not helpful).

But concerning your Problem: Aren't you interested in all unique Paths of length 3 (in terms of vertices, 2 in terms of edges)? Doesn't that condense to

FOR b IN books
   FOR v,e,p IN 2..2 OUTBOUND b GRAPH 'books'
      RETURN DISTINCT {"from": p.vertices[0]._id, "to": p.vertices[2]._id}

It works for a very small sample set. Maybe this is a bit lighter on the query-planer, executioner?

0
votes

Adding option to AQL helped to solve the issue.

OPTIONS {uniqueEdges: 'path',  uniqueVertices: 'global', bfs: true }")