I have a small Titan 0.5.0
cluster with 8 nodes. Every node runs Titan in Rexster 2.5.0
and Cassandra. They all are configured the same. Unfortunately nearly all the time one of them does not manage to start.
In most cases this is one of the seed nodes.
Using cassandra
as storage backend I get the following in the Rexster/Titan log.
WARN com.tinkerpop.rexster.config.GraphConfigurationContainer - Could
not open global configuration com.thinkaurelius.titan.core.TitanException:
Could not open global configuration
at com.thinkaurelius.titan.diskstorage.Backend.
getStandaloneGlobalConfiguration(Backend.java: 405)
...
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException:
Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.
AstyanaxStoreManager.ensureColumnFamilyExists(AstyanaxStoreManager.java:446)
...
Caused by: com.netflix.astyanax.connectionpool.exceptions.BadRequestException:
BadRequestException: [host=192.168.0.10(192.168.0.10):9160, latency=496(496),
attempts=1] InvalidRequestException(why:Cannot add already existing
column family "system_properties" to keyspace "titan")
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(
ThriftConverter.java:159)
Rexster does fail to start and thus did not load the graph.
However, the Cassandra node Rexster failed to connect to seems to be fine: nodetool
lists the node as part of the ring. If I fire requests against the remaining Rexster instances everything seems to work.
I wiped all data before starting the nodes.
I switched to cassandrathrift
resulting in a similar exception (same TitanException caused by PermanentBackendException caused by TimeoutException). The storage timeout in Rexster is 30s. This may be too low since I start all nodes simultaneously at the moment, but does not explain the issues with cassandra
.
What is going wrong here?
edit:
I was misusing Titan. To not have to deal with index creation on startup - which happens quite often in my case - I created the index in the Rexster extension. I think this code got invoked multiple times: When I started multiple nodes simultaneously it seems some of them tried to create the index.
Question: Is there any way the extension can create the indices safely? I created a separate thread for this: What are the methods to create indices?
I increased the storage timeout to 60s and retried the procedure after removing the index creation from code. I still startup all nodes simultaneously. Again one Rexstitan node (seed node #2) fails to start.
The Cassandra log indeed contains an exception
java.lang.IllegalArgumentException: Unknown keyspace/cf pair (titan.txlog)
at org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:166)
at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:326)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
which I can see in both seed nodes. While the Rexster on one seed node does not seem to care the other Rexster instance fails to start with
Caused by: com.netflix.astyanax.connectionpool.exceptions.BadRequestException: BadRequestException: [host=192.168.0.10(192.168.0.10):9160, latency=66(66), attempts=1]InvalidRequestException(why:Cannot add already existing column family "graphindex_lock_" to keyspace "titan")
at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:159)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateColumnFamily(ThriftClusterImpl.java:240)
in rexstitan.log
. Sounds quite similar to the exceptions raised before.
Just to clarify: With fail I mean that Rexster is started and can be queried but failed to load the Titan graph "graph".
Maybe I have to reduce the size to a minimum to check if this is related to cluster size.
edit #2:
It is not related to cluster size. And it's getting really annoying.
Sometimes it is the BadRequestException
above, sometimes it's a BadRequestException
because there already is a keyspace "titan".
Or it is an IllegalArgumentException
:
2646 [main] WARN com.tinkerpop.rexster.config.GraphConfigurationContainer -
Database has already been initialized but not frozen
java.lang.IllegalArgumentException: Database has already been initialized but not frozen
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:93)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1294)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:93)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:73)
at com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration.configureGraphInstance(TitanGraphConfiguration.java:33)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:124)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.<init>(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.<init>(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.<init>(Application.java:97)
at com.tinkerpop.rexster.Application.main(Application.java:189)
Is it not possible to start multiple nodes at once, do they conflict? This is the only reason I can think of, because I can get any exception and sometimes it works fine.