3
votes

I've been researching the Tinkerpop stack for quite a while. I think I have a good idea of what it can do and what databases it works well with. I've got a couple of different databases I'm thinking about right now, but haven't decided on a definite. So I've decided to write my code purely to the interfaces, and not take into account any implementation right now. Out of the databases I'm looking at, they implement TransactionalGraph and KeyIndexableGraph. I think that's good enough for what I need, but I have just one question.

I have different 'classes' of vertices. Using Blueprints, I believe that's best representable by having a field in each vertex containing the class name. Doing that, I can do something like graph.getVertices("classname", "User") and it would give me all of the user vertices. And since the getVertices function specifies that an implementation should make use of indexes, I'm guaranteed to get a fast lookup (if I index that field).

But let's say that I wanted to retrieve a vertex based on two properties. The vertex must have className=Users and username=admin. What's the best way to go about finding that single vertex? And is it possible to index over both of those properties, even though not all vertices will have a username field?

FYI - The databases I'm currently thinking of are OrientDB, Neo4j and Titan, but I haven't decided for sure yet. I'm also currently planning to use Gremlin if that helps at all.

2

2 Answers

4
votes

Using a "class" or a "type" for vertices is a good way to segment them. Doing:

graph.createKeyIndex("classname",Vertex.class);
graph.getVertices("classname", "User");

is a pretty common pattern and should generally yield a fast lookup, though iterating an index of tens of millions of users might not be so great (if you intend to grow a particular classname to very big size). I think that leads to the second part of your question, in regards to doing a two property lookup.

Taking your example on the surface, the two element lookup would be something like (using Gremlin):

g.V('classname',"User").has('username','admin')

So, you narrow the vertices to just "User" vertices with a key index and then filter those for "admin". But, I'd model this differently. It would be even less expensive to simply do:

graph.createKeyIndex("username",Vertex.class);
graph.getVertices("username", "admin");

or in Gremlin:

g.V('username','admin')

If you know the username you want, there's no better/faster way to model this. You really only need the classname if you want to iterate over all "User" vertices. If you just want to find one (or a set of vertices with that username) then key indexing on that property is the better way.

Even if I don't create a key index on it, I still include a type or classname property on all vertices. I find it helpful in global operations where I may or may not care about speed, but just need an answer.

3
votes
  1. graph.getVertices() will iterate through all vertexes and look for ones with that property if you do not have the auto-index turned on in your graph implementation. If you already have data and cannot just turn on the auto-indexer, you should use is index = indexableGraph.getIndex() and then index.get('classname', 'User')

  2. It's possible to perform a query over multiple objects, but without specifics, it's hard to say. For Neo4j they use Lucene, which means that query() will take a lucene query, such as className:Users AND username:admin, but I cannot speak for the others.

Yeah of those DB is good for playing with, I personally found neo4j to be the easiest, and as long as you understand their licensing structure, you shouldn't have any problems using them.