2
votes

I'm using titan graph db with tinkerpop plugin. What is the best way to retrieve a vertex using has step?

Assuming employeeId is a unique attribute which has a unique vertex centric index defined.

Is it through label i.e g.V().has(label,'employee').has('employeeId','emp123') g.V().has('employee','employeeId','emp123')

(or) is it better to retrieve a vertex based on Unique properties directly? i.e g.V().has('employeeId','emp123')

Which one of the two is the quickest and better way?

2

2 Answers

3
votes

First you have 2 options to create the index:

  1. mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).buildCompositeIndex()
  2. mgmt.buildIndex('byEmployeeId', Vertex.class).addKey(employeeId).indexOnly(employee).buildCompositeIndex()

For option 1 it doesn't really matter which query you're going to use. For option 2 it's mandatory to use g.V().has('employee','employeeId','emp123').

Note that g.V().hasLabel('employee').has('employeeId','emp123') will NOT select all employees first. Titan is smart enough to apply those filter conditions, that can leverage an index, first.

One more thing I want to point out is this: The whole point of indexOnly() is to allow to share properties between different types of vertices. So instead of calling the property employeeId, you could call it uuid and also use it for employers, companies, etc:

mgmt.buildIndex('employeeById', Vertex.class).addKey(uuid).indexOnly(employee).buildCompositeIndex()
mgmt.buildIndex('employerById', Vertex.class).addKey(uuid).indexOnly(employer).buildCompositeIndex()
mgmt.buildIndex('companyById', Vertex.class).addKey(uuid).indexOnly(company).buildCompositeIndex()

Your queries will then always have this pattern: g.V().has('<label>','<prop-key>','<prop-value>'). This is in fact the only way to go in DSE Graph, since we got completely rid of global indexes that span across all types of vertices. At first I really didn't like this decision, but meanwhile I have to agree that this is so much cleaner.

1
votes

The second option g.V().has('employeeId','emp123') is better as long as the property employeeId has been indexed for better performance.

This is because each step in a gremlin traversal acts a filter. So when you say:

g.V().has(label,'employee').has('employeeId','emp123')

You first go to all the vertices with the label employee and then from the employee vertices you find emp123.

With g.V().has('employeeId','emp123') a composite index allows you to go directly to the correct vertex.

Edit:

As Daniel has pointed out in his answer, Titan is actually smart enough to not visit all employees and leverages the index immediately. So in this case it appears there is little difference between the traversals. I personally favour using direct global indices without labels (i.e. the first traversal) but that is just a preference when using Titan, I like to keep steps and filters to a minimum.