3
votes

This is for a project that will map metadata. There are many more nodes but this particular one became a debate in the team.

Which model would yield the best query performance? Or it does not matter?

Option 1

Permission metadata is explicit as edges between nodes.

Option 2

Permission metadata is inside the properties of the edge.

Option 3

???

Graph Modeling of Metadata: Which one is better?

2
did one of the answers answer your question? do you mind marking it as solved if? if not, whats missing?dothebart
I marked Max´s as the answer. We did benchmark and as far as I can tell, it all depends on the model and questions you try to answer. We ended up using both types, whichever proved easier to write. We´re showing a small proof of concept tomorrow. :-)Ricardo

2 Answers

6
votes

Let me comment for ArangoDB here, being one of its developers.

There is a third possibility, namely to have a single vertex collections and multiple edge collections for the different access methods. You would then "officially" have 3 graphs that share the same vertex set.

I would expect that this is better in performance, because each access type would only have to deal with a single type of edge and access would be fast.

Obviously it all depends on your queries. My statement holds for queries like "what are all the Entities a Person can update?" or "who can select this Entity?".

I could imagine that your standard query is more "Can this person delete that Entity?" or "Which access rights does this person have for that Entity?".

These two questions are probably not efficient with any of the approaches suggested, because as far as I see, all of them would then require a search, either in the outgoing edges of the Person or in the incoming edges of the Entity.

What would be needed here are a kind of "vertex centric indices", that is an index that can be used for the set of outgoing or incoming edges of a given vertex. If you, for example would use your option 2 (or indeed 1, this does not matter so much), and have a sorted index on all edges that is sorted first by Person and then by Entity. Then it is a lookup with time complexity O(log(#edges)) to find the (probably singleton) set of edges from a given Person to a given Entity.

We at ArangoDB are currently busy to add this feature, which will appear in one of the next two releases.

1
votes

I can only speak for Neo4j here:

I don't know that it would matter much, but definitely benchmark! Both relationships and properties are stored as linked lists, so it will still need to traverse them. But if you have more relationships between Person and Entity nodes then putting them in properties starts to become more attractive.

I recommend checking out the free O'Reilly book Graph Databases to learn more about the internals of Neo4j. But benchmarks will always be the gold standard.