19
votes

I have a general question about modeling in a graph database that I just can't seem to wrap my head around.

How do you model this type of relationship: "Newton invented Calculus"?

In a simple graph, you could model it like this:

Newton (node) -> invented (relationship) -> Calculus (node)

...so you'd have a bunch of "invented" graph relationships as you added more people and inventions.

The problem is, you start needing to add a bunch of properties to the relationship:

  • invention_date
  • influential_concepts
  • influential_people
  • books_inventor_wrote

...and you'll want to start creating relationships between those properties and other nodes, such as:

  • influential_people: relationship to person nodes
  • books_inventor_wrote: relationship to book nodes

So now it seems like the "real-world relationships" ("invented") should actually be a node in the graph, and the graph should look like this:

Newton (node) -> (relationship) -> Invention of Calculus (node) -> (relationship) -> Calculus (node)

And to complicate things more, other people are also participated in the invention of Calculus, so the graph now becomes something like:

Newton (node) -> 
  (relationship) -> 
    Newton's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)
Leibniz (node) -> 
  (relationship) -> 
    Leibniz's Calculus Invention (node) -> 
      (relationship) -> 
        Invention of Calculus (node) -> 
          (relationship) -> 
            Calculus (node)

So I ask the question because it seems like you don't want to set properties on the actual graph database "relationship" objects, because you may want to at some point treat them as nodes in the graph.

Is this correct?

I have been studying the Freebase Metaweb Architecture, and they seem to be treating everything as a node. For example, Freebase has the idea of a Mediator/CVT, where you can create a "Performance" node that links an "Actor" node to a "Film" node, like here: http://www.freebase.com/edit/topic/en/the_last_samurai. Not quite sure if this is the same issue though.

What are some guiding principles you use to figure out if the "real-world relationship" should actually be a graph node rather than a graph relationship?

If there are any good books on this topic I would love to know. Thanks!

1

1 Answers

19
votes

Some of these things, such as invention_date, can be stored as properties on the edges as in most graph databases edges can have properties in the same way that vertexes can have properties. For example you could do something like this (code follows TinkerPop's Blueprints):

Graph graph = new Neo4jGraph("/tmp/my_graph");
Vertex newton = graph.addVertex(null);
newton.setProperty("given_name", "Isaac");
newton.setProperty("surname", "Newton");
newton.setProperty("birth_year", 1643); // use Gregorian dates...
newton.setProperty("type", "PERSON");

Vertex calculus = graph.addVertex(null);
calculus.setProperty("type", "KNOWLEDGE");

Edge newton_calculus = graph.addEdge(null, newton, calculus, "DISCOVERED");
newton_calculus.setProperty("year", 1666);   

Now, lets expand it a little bit and add in Liebniz:

Vertex liebniz = graph.addVertex(null);
liebniz.setProperty("given_name", "Gottfried");
liebniz.setProperty("surnam", "Liebniz");
liebniz.setProperty("birth_year", "1646");
liebniz.setProperty("type", "PERSON");

Edge liebniz_calculus = graph.addEdge(null, liebniz, calculus, "DISCOVERED");
liebniz_calculus.setProperty("year", 1674);

Adding in the books:

Vertex principia = graph.addVertex(null);
principia.setProperty("title", "PhilosophiƦ Naturalis Principia Mathematica");
principia.setProperty("year_first_published", 1687);
Edge newton_principia = graph.addEdge(null, newton, principia, "AUTHOR");
Edge principia_calculus = graph.addEdge(null, principia, calculus, "SUBJECT");

To find out all of the books that Newton wrote on things he discovered we can construct a graph traversal. We start with Newton, follow the out links from him to things he discovered, then traverse links in reverse to get books on that subject and again go reverse on a link to get the author. If the author is Newton then go back to the book and return the result. This query is written in Gremlin, a Groovy based domain specific language for graph traversals:

newton.out("DISCOVERED").in("SUBJECT").as("book").in("AUTHOR").filter{it == newton}.back("book").title.unique()

Thus, I hope I've shown a little how a clever traversal can be used to avoid issues with creating intermediate nodes to represent edges. In a small database it won't matter much, but in a large database you're going to suffer large performance hits doing that.

Yes, it is sad that you can't associate edges with other edges in a graph, but that's a limitation of the data structures of these databases. Sometimes it makes sense to make everything a node, for example, in Mediator/CVT a performance has a bit more concreteness too it. Individuals may wish address only Tom Cruise's performance in "The Last Samurai" in a review. However, for most graph databases I've found that application of some graph traversals can get me what I want out of the database.