I have a site written in PHP. It currently uses MySQL for all of its database needs (I'm open to additional DB technologies).
The system's content is interrelated. These relationships can be represented as a graph where vertices are pieces of content and edges are the relationships. I need to be able to traverse that graph. In particular I need to be able to:
- Get the child count at a given depth (e.g. how many grandchildrean does an item have)
- Get the cumulative child count at a given depth (e.g. how many children and grandchildren does an item have)
- Get the maximum depth for a given root (e.g. what is the longest path from this item)
- Get the children at a given depth (e.g. who are the grandchildren of this item)
- Get the parents at a given depth (e.g. who are the grandparents of this item)
- Look up which statuses (such as "hidden" or "locked") have been inherited from parents.
Because it is a graph on a dynamic system and not a tree or traditional hierarchy, there are some intricacies that I think rule out the usual SQL-based tricks (e.g. Adjacency List and Path Enumeration).
The Main Intricacies:
Content can have more than one child.
Content can have more than one parent.
An item's relationship graph might look different for each user. For instance, certain content might be hidden for one person but not another.
Items can appear more than once on a graph tree, and can appear at different path lengths (e.g. item 50 could be an immediate child while also being a 3rd generation child).
Graphs can contain hundreds of thousands of items.
Some Additional Intricacies:
Different types of content can be related (as in, a poll could be related to a forum post, or a user could be related to a community)
There are several different types of relationship (as in, parent/child relationship, ownership relationship, peer relationship)
Depending on the type of relationship, permissions and restrictions may or may not be passed from parent to child (e.g. if a parent is hidden, the child will be hidden as well, but if a peer item is hidden that status isn't passed along)
My Naive (slow) "Solutions"
Currently I am taking the naive approach using SQL. I have a single "Relationships" table with these columns:
item1ID (int)
item1TypeID (int)
item2ID (int)
item2TypeID (int)
relationshipTypeID (int)
In PHP, I dynamically generate queries full of inner self joins to look up the maximum depth, and then once that is figured out I generate a single query which traverses the hierarchy and retrieves whatever information I need. This is already too slow, even with proper indexing.
My second naive approach was going to be moving that traversal and depth lookup into stored procedures. I have no idea if this would actually create a significant speed improvement. I was also thinking of incorporating some sort of caching mechanism so I could avoid looking up maximum depths as often, but that seems like it simply avoiding the real problem.
My Question
There has to be a better way. What is it? I know there are a lot of questions and answers already on StackOverflow regarding the issue of hierarchical information in SQL, but this is not quite hierarchy -- it is a full blown graph.
Since I have strong models I can mix in another DB technology to handle the relationship side of things without ruining the existing code base. I have been looking into NoSQL solutions but I know practically nothing about them. I have also heard of "Graph Databases" (such as Neo4J) which, based on the name and the various powerpoint slides I've seen, sounds like exactly what I need. However, I don't know which ones are actually robust enough to stick around or which ones would play well with PHP.
Help me StackOverflow, you're my only hope.