Why is data corruption happen in Cassandra 1.2?

Question

I dropped a column in Cassandra 1.2 couple days ago by: 1. drop the whole table, 2. recreate the table, without the column, 3. insert insert statement (without the column).

The reason why I did that way is because Cassandra 1.2 doesn't support "drop column" operation.

Today I was notified by Ops team because of the data corruption issue. My questions:

What is the root cause?
How to fix it?

ERROR [ReadStage:79] 2014-11-04 11:29:55,021 CassandraDaemon.java (line 191) Exception in thread Thread[ReadStage:79,5,main] org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 (/data/cassandra/data/xxx/yyy/zzz-Data.db, 1799885 bytes remaining) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:110) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:90) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:171) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:199) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:160) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:291) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1398) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1214) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1130) at org.apache.cassandra.db.Table.getRow(Table.java:344) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:44) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 (/data/cassandra/data/xxx/yyy/zzz-Data.db, 1799885 bytes remaining) at org.apache.cassandra.db.ColumnSerializer$CorruptColumnException.create(ColumnSerializer.java:148) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:86) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:106) ... 24 more ERROR [ReadStage:89] 2014-11-04 11:29:58,076 CassandraDaemon.java (line 191) Exception in thread Thread[ReadStage:89,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:376) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:108) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:106) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:90) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:171) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:199)

Might be related to issues.apache.org/jira/browse/CASSANDRA-5225 are you on the latest 1.2? — phact
Also sounds like issues.apache.org/jira/browse/CASSANDRA-5761 if you have a way to reproduce this, we may want to try to revive this ticket. — phact

phact phact · Accepted Answer · 2014-11-05T16:49:19

C* 1.2 supports column deletions for cql tables - http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_delete.html

However, I do not see anything wrong from the procedure you described to re-create a new table without your column. Here are some steps to go forward.

Assumptions -

The corruption you are seeing is in the new table not the old one (do they have the same name?)
You have a replication factor and number of nodes that are high enough for you to be able to take this node offline
Your client's load balancing policy is set up appropriately so that when the node goes down it will fail over to another node

Procedure -

1) Take your node offline

nodetool drain

This will flush memtables and make your node stop accepting requests.

2) Run nodetool scrub

nodetool scrub [keyspace][table]

If this completes successfully then you are done, bring your node back-up by restarting cassandra and run a nodetool repair keyspace table

3) If scrub errored out (probably with a corruption error), try the sstablescrub utility. ssh into your box and run:

sstablescrub <keyspace> <table>

Note, run this using the same os user you use to start cassandra.

If this completes successfully then you are done, bring your node back-up by restarting cassandra and run a nodetool repair keyspace table

4) If this doesn't work (again errors out with a corruption error) you will have to remove the SStable and rebuild it from your other replicas using repair:

mv the culprit sstable from your data directory to a backup directory
restart cassandra (delete it later once it's rebuilt)
nodetool repair keyspace cf -- This repair will take time.

Why is data corruption happen in Cassandra 1.2?

1 Answers

Assumptions -

Procedure -

Please let me know if you are able to reproduce this corruption.