I am figuring out my options for storing hierarchical data (parent - child relationships).
Since a tree is a graph and a forest (of trees) is also technically a graph, a graph database seems to fit the bill much better than a RDBMS esp. since I am concerned with optimizing both read and write operations.
- Optimizing writes implies changes in hierarchy require minimal writes.
- Optimizing reads implies materializing the full path to a particular node consumers minimal read operations.
My use case is:
- A tree per user. Should I store and use one graph across the user space or one graph per user?
- Path queries starting at any node and back to root of tree for a user.
- Child nodes store links to parent nodes
Since all of my resources are in AWS, being able to use the Titan DynamoDB backend seems ideal.
My real problem is in understanding how to scale and manage Titan though.
Do I need a gremlin server instance? In other words, do I need to stand up a EC2 instance with gremlin server in order to do anything with Titan? Or can I use the Java Titan API to work with graph data directly?
Do I need to explicitly shard the data? In other words, do I need to stand up more gremlin servers as usage increases and the amount of data and the amount of operations increase? When the number of servers scale out, do I need to consistent hash across those servers from the client in order to perform operations?
Do I need to setup an elastic search cluster to be able to start traversals from any node? Or is using vertices to represent objects and edges to represent parent relationships sufficient at this point? I can guarantee that vertex ID's are unique across the user-space ; I can also decorate each vertex with the unique user ID as well. In that case, do I need elastic search? My hope is that elastic search is for free form or more complex search type queries and not for exact queries!
As the number of front-ends increase, can each front-end open the graph (single graph across user space)? If a graph per user, then since front-ends have no affinity, the same graph may be opened for each user; is that OK?
I wasn't able to find much documentation on any of this. Thank you!