3
votes

Are there sets of best practices to approach how to model data in a graph database (I am considering arangodb right now but the question would apply to other platforms)? Here is a practical case to illustrate my question:

Assuming we are creating a centralised contact list for users. Each user has contacts but some contacts could be common to users e.g. John knows Mary, and Marc knows Mary. I would thus have 3 nodes (John, Mary and Marc) but John should only see his relationship to Mary, not Marc's relationship to Mary

So how should a full graph be designed in order to support user access to their information?

Option 1: Create 1 graph per user. That way, I know exactly who can see what (I could for example prefix all my collections with the user id). That would be simple but would duplicate a lot of data (e.g. if I put all my family in the db, my brother will do too, creating twice the same data, in different graphs)

Option 2: Create 1 general graph with Contact nodes, plus User nodes. I would have the contact John, Mary and Marc connected, but the User node representing John, would be linked to the Contact nodes John and Mary only. That way I would know to get only the contact nodes that are connected to the User node I am focusing on. The problem is that edges cannot be linked to the User node (I cannot have an edge going from a node to an edge...can I?). So I would have to add an attribute of user_id to all the edges in order to only fetch the ones relevant to the current user. This is slightly nicer as I do not have to duplicate nodes, but I would still have to duplicate edges as they would be user specific

Option 3: Do it SQL like with a Rights table, maintaining a list of Contact ids along with what user can see what Node and what Edge (heavy on joins)

Options 4: ???

As in everything, there are many ways to reach a solution but I was wondering what was considered best practice to balance cleanliness of approach and performance for insertion/deletion...knowing that performance might be platform dependent

1

1 Answers

1
votes

i would suggest an Option 4:

First i would not distinguish between User and Contact Nodes, but all of them should be Contact Nodes. If you create a new User you basically create a new Contact for him (or use an existing one) and connect your Applications Authentication to this specific Contact.

Then you can use directed edges to create the contact list for a user. Say you have two users John and Mary, than John can add Mary to his contact list, but Mary would not recognize. If she wants to add John this means you will add a second edge. If you want to have symmetrical contacts only (if John adds Mary to his list, he should automatically appear in her list) you simply ignore this direction in your queries.

If you now want to get the contacts for John this can be done by selecting the Neighbors of John.

In ArangoDB this can be realized with two collections, say Contact and Knows, where Knows holds the edges.

The following code pasted into arangosh creates your situation described above:

db._create("Contact");
db._createEdgeCollection("Knows");
db.Contact.save({_key: "John", mail: "[email protected]"});
db.Contact.save({_key: "Mary", mail: "[email protected]"});
db.Contact.save({_key: "Marc", mail: "[email protected]"});

db.Knows.save("Contact/John", "Contact/Mary", {});
db.Knows.save("Contact/Marc", "Contact/Mary", {});

To query the contact list for user John:

db._query('RETURN NEIGHBORS(Contact, Knows, "John", "outbound")').toArray()

Should give Mary as result, no information about Marc.

If you do not want to join Contacts and User Accounts as i suggested you could also separate them in different collections, in this case you have to slightly modify the edges and the query:

db.Knows.save("User/John", "Contact/Mary", {});
db.Knows.save("User/Marc", "Contact/Mary", {});

db._query('RETURN NEIGHBORS(Users, Knows, "John", "outbound")').toArray()

should give the same result.

Edit: Regarding your question in Option 2: In ArangoDB it is actually possible to point edges to other edges, however build in graph functionality will now consider the edges pointed to as if they were nodes. This means they do not follow their direction automatically. But you can use these resulting edges in further AQL statements and continue the search with AQL features.