2
votes

In our large Titan Graph database, I notice the following behaviour:

         \,,,/
         (o o)
-----oOOo-(_)-oOOo-----
14:16:35 WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
gremlin> g = TitanFactory.open('/home/willem/workspace/ovc/src/main/resources/titan-cassandra-es.properties')
14:16:44 WARN  com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration  - Local setting cache.db-cache-time=0 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (180000).  Use the ManagementSystem interface instead of the local configuration to control this setting.
==>titangraph[com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager:[10.1.0.200]]
gremlin> g.indexQuery("mediaSerialNBStringIdx","v.mediaSerialNB:EB*").vertices().count()
==>937
gremlin> g.V().has("mediaSerialNB",PREFIX,"EB").count()
14:17:17 WARN  com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx  - Query requires iterating over all vertices [(mediaSerialNB PREFIX EB)]. For better performance, use indexes

So, directly addressing the index using indexQuery(...) takes advantage of the index but leaving it to the query optimizer it does not pick up the fact that there is a MixedIndex on that particular field.

This is Titan 0.5.3 running with elasticsearch 1.2.2.

These are the index specifics:

gremlin> m = g.getManagementSystem()
==>com.thinkaurelius.titan.graphdb.database.management.ManagementSystem@6a26cb53
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").isMixedIndex()
==>true
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").getFieldKeys()
==>mediaSerialNB
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").getBackingIndex()
==>search
gremlin> k = m.getPropertyKey("mediaSerialNB")
==>mediaSerialNB
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").getIndexStatus(k)
==>INSTALLED

Does the fact that the index status is "INSTALLED" rather than "ENABLED" give me the clue? If so, how can I help elasticsearch to enable it?

Reading up on reindexing, I found the following:

mgmt.updateIndex(rindex, SchemaAction.ENABLE_INDEX);

But this is wat our database tells us:

gremlin> mediaSerialNBKey = g.getPropertyKey("mediaSerialNB")
==>mediaSerialNB
gremlin> mediaSerialNBStringIdx = m.getGraphIndex("mediaSerialNBStringIdx")
==>com.thinkaurelius.titan.graphdb.database.management.TitanGraphIndexWrapper@7c54dcff
gremlin> mediaSerialNBStringIdx.getParametersFor(mediaSerialNBKey)
==>mapping->STRING
==>mapped-name->4h6t
==>status->INSTALLED
gremlin> m.updateIndex(mediaSerialNBStringIdx, SchemaAction.ENABLE_INDEX)
Update action [ENABLE_INDEX] does not apply to any fields for index [com.thinkaurelius.titan.graphdb.database.management.TitanGraphIndexWrapper@7c54dcff]
1

1 Answers

5
votes

Yes, you need to have your index enabled. To do this, index must be in state REGISTERED, not INSTALLED as it is in your case. Normally this transition happens automatically, when all titan instances using the same storage backend acknowledge index changes.

It is possible howewer, that you have some instances which are no longer active. You can list all instances in gremlin console:

m=g.getManagementSystem()
m.getOpenInstances()

If there are any dead instances, you should manually remove them, using

mgmt.forceCloseInstance("dead-instance-id")
mgmt.commit()

You can find more in documentation, section 27.2.

From my experience it is best to shut down all instances except gremlin session before performing index maintenance.

Now, you can manually register index (see section 28.7.1):

m = g.getManagementSystem()
mediaSerialNBStringIdx = m.getGraphIndex("mediaSerialNBStringIdx")
m.updateIndex(mediaSerialNBStringIdx, SchemaAction.REGISTER_INDEX)
m.commit()

To check:

m = g.getManagementSystem()
k = m.getPropertyKey("mediaSerialNB")
m.getGraphIndex("mediaSerialNBStringIdx").getIndexStatus(k)
// should return REGISTERED

Now you can succesfully enable your index:

m = g.getManagementSystem()
mediaSerialNBStringIdx = m.getGraphIndex("mediaSerialNBStringIdx")
m.updateIndex(mediaSerialNBStringIdx, SchemaAction.ENABLE_INDEX)
m.commit()