We're using cloudformation to automate the setup and tear down of several cassandra clusters we use for load testing. During this load test, we use opscenter to monitor our throughput. What I've found is that storing the opscenter data in our test's target cluster is skewing our node's data ownership information. As a result, I'd like to move opscenter and the agent data to it's own node. I have a single c3.4xl set up with a single cassandra instance and opscenter. I have the following configuration files.
opscenter server
/etc/opscenter/clusters/usergrid.conf
[cassandra]
seed_hosts = ec2-23-22-188-56.compute-1.amazonaws.com,ec2-54-163-164-41.compute-1.amazonaws.com,ec2-54-166-10-160.compute-1.amazonaws.com,ec2-54-166-219-212.compute-1.amazonaws.com,ec2-54-211-181-126.compute-1.amazonaws.com,ec2-54-82-161-157.compute-1.amazonaws.com,ec2-54-82-30-122.compute-1.amazonaws.com,ec2-54-83-98-182.compute-1.amazonaws.com,ec2-54-91-209-251.compute-1.amazonaws.com
[storage_cassandra]
seed_hosts = ec2-54-204-237-40.compute-1.amazonaws.com
api_port = 9160
datastax-agent cat /var/lib/datastax-agent/conf/address.yaml
stomp_interface: ec2-54-204-237-40.compute-1.amazonaws.com
However in the agents I see this in the logs in /var/log/datastax-agent/agent.log.
INFO [thrift-init] 2014-11-03 14:33:41,069 Connected to Cassandra cluster: usergrid
INFO [thrift-init] 2014-11-03 14:33:41,071 in execute with client org.apache.cassandra.thrift.Cassandra$Client@6deebf54
INFO [thrift-init] 2014-11-03 14:33:41,072 Using partitioner: org.apache.cassandra.dht.Murmur3Partitioner
INFO [pdp-loader] 2014-11-03 14:33:41,072 Attempting to load stored metric values.
ERROR [pdp-loader] 2014-11-03 14:33:41,092 There was an error when attempting to load stored rollups.
me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace 'OpsCenter' does not exist)
at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:112)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:251)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:132)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:290)
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:101)
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
at clj_hector.core$execute_query.doInvoke(core.clj:201)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at clj_hector.core$get_column_range.doInvoke(core.clj:298)
at clojure.lang.RestFn.invoke(RestFn.java:587)
at opsagent.cassandra$scan_pdps$fn__1051.invoke(cassandra.clj:182)
at opsagent.cassandra$scan_pdps.invoke(cassandra.clj:181)
at opsagent.cassandra$process_pdp_row$fn__1060.invoke(cassandra.clj:199)
at opsagent.cassandra$process_pdp_row.invoke(cassandra.clj:197)
at opsagent.cassandra$process_pdp_row.invoke(cassandra.clj:195)
at opsagent.cassandra$load_pdps_with_retry$fn__1066.invoke(cassandra.clj:213)
at opsagent.cassandra$load_pdps_with_retry.invoke(cassandra.clj:210)
at opsagent.cassandra$setup_cassandra$f__388__auto____1094$fn__1095$f__388__auto____1102.invoke(cassandra.clj:357)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)
Caused by: InvalidRequestException(why:Keyspace 'OpsCenter' does not exist)
at org.apache.cassandra.thrift.Cassandra$set_keyspace_result.read(Cassandra.java:5452)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_set_keyspace(Cassandra.java:531)
at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:518)
at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:110)
... 22 more
Generally this would indicate that the client cannot connect to the storage Cassandra node. However, from the agent node, I can execute the following command.
cassandra-cli -h ec2-54-204-237-40.compute-1.amazonaws.com
Which I can then describe the keyspace, which works.
[default@unknown] describe OpsCenter;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
Keyspace: OpsCenter:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [us-east:1]
Column Families:
ColumnFamily: bestpractice_results
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.IntegerType)
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: events
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Column Metadata:
Column Name: success
Validation Class: org.apache.cassandra.db.marshal.BooleanType
Column Name: action
Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: level
Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: time
Validation Class: org.apache.cassandra.db.marshal.LongType
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: events_timeline
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.LongType
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: pdps
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups300
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups60
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups7200
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups86400
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: settings
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.BytesType
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
[default@unknown]
This signals to me that the target Cassandra node is up and running, and has the keyspace + column families. It also indicates I don't have any sort of network firewall issues between the agent -> cassandra. I'm at a loss to explain why I'm receiving this error message. Am I still missing something in my configuration, or is this a bug?
Cassandra: 1.2.19 Opscenter: 5.0.1 DS Agent: 5.0.1
Any help would be greatly appreciated!
Thanks, Todd
UPDATE
Here is the agent log. Note my IP's have changed since this is a new environment. It appears that it's trying to connect to 10.81.168.96:9160, which is NOT the ec2 IP that's set in my settings of ec2-174-129-181-123.compute-1.amazonaws.com. Not sure where that's coming from, but it's not what is set on the opscenter server.
agent.log
https://gist.github.com/tnine/f509c120465eb80ade92
Sorry for the gist, but I've exceeded the character limit.