3
votes

I can't seem to find any discussion on this. I had been imagining a database that was schemaless and node based and heirarchical, and one day I decided it was too common sense to not exist, so I started searching around and neo4j is about 95% of what I imagined.

What I didn't imagine was the concept of relationships. I don't understand why they are necessary. They seem to add a ton of complexity to all topics centered around graph databases, but I don't quite understand what the benefit is. Relationships seem to be almost exactly like nodes, except more limited.

To explain what I'm thinking, I was imagining starting a company, so I create myself as my first nodes:

create (u:User { u.name:"mindreader"});
create (c:Company { c.name:"mindreader Corp"});

One day I get a customer, so I put his company into my db.

create (c:Company { c.name:"Customer Company"});
create (u:User { u.name:"Customer Employee1" });
create (u:User { u.name:"Customer Employee2"});

I decide to link users to their customers

match (u:User) where u.name =~ "Customer.*"
match (c:Company) where c.name =~ "Customer.*
create (u)-[:Employee]->(c);

match (u:User where name = "mindreader"
match (c:Company) where name =~ "mindreader.*"
create (u)-[:Employee]->(c);

Then I hire some people:

match (c:Company) where c.name =~ "mindreader.*"
create (u:User { name:"Employee1"})-[:Employee]->(c)
create (u:User { name:"Employee2"})-[:Employee]->(c);

One day hr says they need to know when I hired employees. Okay:

match (c:Company)<-[r:Employee]-(u:User)
where name =~ "mindreader.*" and u.name =~ "Employee.*"
set r.hiredate = '2013-01-01';

Then hr comes back and says hey, we need to know which person in the company recruited a new employee so that they can get a cash reward for it.

Well now what I need is for a relationship to point to a user but that isn't allowed (:Hired_By relationship between :Employee relationship and a User). We could have an extra relationship :Hired_By, but if the :Employee relationship is ever deleted, the hired_by will remain unless someone remembers to delete it.

What I could have done in neo4j was just have a

(u:User)-[:hiring_info]->(hire_info:HiringInfo)-[:hired_by]->(u:User)

In which case the relationships only confer minimal information, the name.

What I originally envisioned was that there would be nodes, and then each property of a node could be a datatype or it could be a pointer to another node. In my case, a user record would end up looking like:

User {
  name: "Employee1"
  hiring_info: {
    hire_date: "2013-01-01"
    hired_by: u:User # -> would point to a user
  }
}

Essentially it is still a graph. Nodes point to each other. The name of the relationship is just a field in the origin node. To query it you would just go

match (u:User) where ... return u.name, u.hiring_info.hiring_date, u.hiring_info.hired_by.name

If you needed a one to many relationship of the same type, you would just have a collection of pointers to nodes. If you referenced a collection in return, you'd get essentially a join. If you delete hiring_info, it would delete the pointer. References to other nodes would not have to be a disorganized list at the toplevel of a node. Furthermore when I query each user I will know all of the info about a user without both querying for the user itself and also all of its relationships. I would know his name and the fact that he hired someone in the same query. From the database backend, I'm not sure much would change.

I see quite a few questions from people asking whether they should use nodes or relationships to model this or that, and occasionally people asking for a relationship between relationships. It feels like the XML problem where you are wondering if a pieces of information should be its own tag or just a property its parent tag.

The query engine goes to great pains to handle relationships, so there must be some huge advantage to having them, but I can't quite see it.

4
Excellent post! The issue appears to be that Neo4j treats relationships as first-class citizens (and by extension, nodes appear to be second-class). I really love the concept of graph database and how Neo4j implements it, but you make very valid points. A pointer system would have been better for some use cases like the one you demonstrate. In some way, you may be able to emulate what you want by having relationships with no properties whatsoever, and storing as much as you can within the nodes themselves. I don't know how far you will get with that, as it goes against the Neo4j philosophy.ADTC

4 Answers

3
votes

Different databases are for different things. You seem to be looking for a noSQL database.

This is an extremely wide topic area that you've reached into, so I'll give you the short of it. There's a spectrum of database schemas, each of which have different use cases.

  • NoSQL aka Non-relational Databases:

    Every object is a single document. You can have references to other documents, but any additional traversal means you're making another query. Times when you don't have relationships between your data very often, and are usually just going to want to query once and have a large amount of flexibly-stored data as the document that is returnedNote: These are not "nodes". Node have a very specific definition and implies that there are edges.)

  • SQL aka Relational Databases:

    This is table land, this is where foreign keys and one-to-many relationships come into play. Here you have strict schemas and very fast queries. This is honestly what you should use for your user example. Small amounts of data where the relationships between things are shallow (You don't have to follow a relationship more than 1-2 times to get to the relevant entry) are where these excel.

  • Graph Database:

    Use this when relationships are key to what you're trying to do. The most common example of a graph is something like a social graph where you're connecting different users together and need to follow relationships for many steps. (Figure out if two people are connected within a depth for 4 for instance)

Relationships exist in graph databases because that is the entire concept of a graph database. It doesn't really fit your application, but to be fair you could just keep more in the node part of your database. In general the whole idea of a database is something that lets you query a LOT of data very quickly. Depending on the intrinsic structure of your data there are different ways that that makes sense. Hence the different kinds of databases.

In strongly connected graphs, Neo4j is 1000x faster on 1000x the data than a SQL database. NoSQL would probably never be able to perform in a strongly connected graph scenario.

2
votes

Take a look at what we're building right now: http://vimeo.com/81206025

Update: In reaction to mindreader's comment, we added the related properties to the picture: enter image description here

1
votes

RDBM systems are tabular and put more information in the tables than the relationships. Graph databases put more information in relationships. In the end, you can accomplish much the same goals.

However, putting more information in relationships can make queries smaller and faster.

Here's an example:

SQL versus Cypher queries

Graph databases are also good at storing human-readable knowledge representations, being edge (relationship) centric. RDF takes it one step further were all information is stored as edges rather than nodes. This is ideal for working with predicate logic, propositional calculus, and triples.

1
votes

Maybe the right answer is an object database.

Objectivity/DB, which now supports a full suite of graph database capabilities, allows you to design complex schema with one-to-one, one-to-many, many-to-one, and many-to-many reference attributes. It has the semantics to view objects as graph nodes and edges. An edge can be just the reference attribute from one node to another or an edge can exist as an edge object that sits between two nodes.

enter image description here

An edge object can have any number of attribute and can have references off to other objects, as shown in the diagram below.

enter image description here

Being able to "hang" complex objects off of an edge allows Objectivity/DB to support weighted queries where the edge-weight can be calculated using a user-defined weight calculator operator. The weight calculator operator can build the weight from a static attribute on the edge or build the weight by digging down through the objects connected to the edge. In the picture, above, we could create a edge-weight calculator that computes the sum of the CallDetail lengths connected to the Call edge.