ArangoDB multiple edge collection performance

Question

I'm building an ArangoDB edge collection that consists of many "types". By type, think animal species taxonomy.

I will be building a graph that connects all of these. Example: parent/child of ancient homo species: Homo habilis->Homo floresiensis->Homo erectus->Homo sapiens

Putting they different types in different collections would only be for superficial organizational reasons. There's a small possibility that it would be useful in the future for features I haven't thought of yet.

My specific questions is: Does building graphs in ArangoDB, that uses multiple collections, take a performance hit? Will using one large collection be more efficient for graphs?

Answering the first comment: If I break this out into different edge collections, it would be 4 collections with about 300,000 rows in each. Type can have multiple parents and children. The types of queries would be shortest path and any connectedness between each. If that makes sense? 6 degrees of Kevin Bacon type thing.

EDIT: Please see the comments for some questions and answers. Almost every single query will span multiple types. Many queries will be 5-7 vertices deep. This project will almost exclusively be READING... I'm not worried about write speed at all.

EDIT 2: Will I be using a single instance or a distributed cluster? Honestly, either! Whatever will speed up reads. You tell me.

The answer will probably depend on the types of queries you will be running. Could you be more specific about that, and also tell us how many different types of edge collections you envision? You only gave one example (parent/child). It might also be helpful to know how many node collections you expect, and roughly how many nodes? — peak
Will single queries typically span multiple edge collections? Could you give an example of a second edge collection, as well as an example of a query that DOES span multiple edge collections? — peak
Almost ever single query would span multiple data collections but only a single edge collection. — Chemdream

Simon Grätzer Simon Grätzer · Accepted Answer · 2018-04-09T12:58:17

In the single server setup, using multiple collections does not have any penalty. Especially if your query does not span all edge collections, it will be faster to perform lookups on smaller collections.

How much faster / slower this will depends on the storage engine (rocksdb / mmfiles). Given that you want to go for maximum read performance mmfiles will be likely faster.

ArangoDB multiple edge collection performance

2 Answers