In our large Titan Graph database, I notice the following behaviour:
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
14:16:35 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
gremlin> g = TitanFactory.open('/home/willem/workspace/ovc/src/main/resources/titan-cassandra-es.properties')
14:16:44 WARN com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-time=0 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (180000). Use the ManagementSystem interface instead of the local configuration to control this setting.
==>titangraph[com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager:[10.1.0.200]]
gremlin> g.indexQuery("mediaSerialNBStringIdx","v.mediaSerialNB:EB*").vertices().count()
==>937
gremlin> g.V().has("mediaSerialNB",PREFIX,"EB").count()
14:17:17 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [(mediaSerialNB PREFIX EB)]. For better performance, use indexes
So, directly addressing the index using indexQuery(...) takes advantage of the index but leaving it to the query optimizer it does not pick up the fact that there is a MixedIndex on that particular field.
This is Titan 0.5.3 running with elasticsearch 1.2.2.
These are the index specifics:
gremlin> m = g.getManagementSystem()
==>com.thinkaurelius.titan.graphdb.database.management.ManagementSystem@6a26cb53
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").isMixedIndex()
==>true
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").getFieldKeys()
==>mediaSerialNB
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").getBackingIndex()
==>search
gremlin> k = m.getPropertyKey("mediaSerialNB")
==>mediaSerialNB
gremlin> m.getGraphIndex("mediaSerialNBStringIdx").getIndexStatus(k)
==>INSTALLED
Does the fact that the index status is "INSTALLED" rather than "ENABLED" give me the clue? If so, how can I help elasticsearch to enable it?
Reading up on reindexing, I found the following:
mgmt.updateIndex(rindex, SchemaAction.ENABLE_INDEX);
But this is wat our database tells us:
gremlin> mediaSerialNBKey = g.getPropertyKey("mediaSerialNB")
==>mediaSerialNB
gremlin> mediaSerialNBStringIdx = m.getGraphIndex("mediaSerialNBStringIdx")
==>com.thinkaurelius.titan.graphdb.database.management.TitanGraphIndexWrapper@7c54dcff
gremlin> mediaSerialNBStringIdx.getParametersFor(mediaSerialNBKey)
==>mapping->STRING
==>mapped-name->4h6t
==>status->INSTALLED
gremlin> m.updateIndex(mediaSerialNBStringIdx, SchemaAction.ENABLE_INDEX)
Update action [ENABLE_INDEX] does not apply to any fields for index [com.thinkaurelius.titan.graphdb.database.management.TitanGraphIndexWrapper@7c54dcff]