0
votes

Up to this point all was well. We had a BSOD on a machine and now have corrupt SSTables. We are trying to find the correct procedure to get this node online. I would just love to kill the data and repair the node as we have replication 2 but I cant do that due to the amount of data on each node.

Attached is the error.

I tried to run nodetool scrub but since DSE cannot start, I get the normal cannot connect to 127.0.0.1 error.

Should I edit the config and change from policy stop to best effort then start/run the command?

Thanks,


ERROR 20:58:34 Exiting forcefully due to file system exception on startup, disk failure policy "stop" org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] Caused by: java.io.EOFException: null at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.8.0_66] at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.8.0_66] at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.8.0_66] at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] ... 14 common frames omitted ERROR 20:58:34 Exiting forcefully due to file system exception on startup, disk failure policy "stop" org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_66] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] Caused by: java.io.EOFException: null at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.8.0_66] at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.8.0_66] at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.8.0_66] at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) ~[cassandra-all-2.1.11.908.jar:2.1.11.908] ... 14 common frames omitted INFO 20:58:34 DSE shutting down... INFO 20:58:34 All plugins are stopped.

2
Yes, I started it but then quickly stopped due to the #1 rule. Try scrub first. Should I try this one instead? - Kenneth Bean
that or just delete the corrupted sstable and do a repair - Chris Lohfink
If you do delete the sstable run a repair after you're up. BTW this is windows? - phact
Sorry, no. Its Ubuntu 14.04, but I copied from WinSCP (Connecting to the console). Will the repair require to go through all data? Will the node come up and accept new data while it does the repair? Before I say my next statement, Yes, I know its not what normal people do. We have about 14TB on this node. I even already spoke with DataStax and had a meeting about what we are doing. Will it have to transfer the 14TB back or just verify the data from the other nodes? Does it even need other nodes to re-create the SSTables? - Kenneth Bean

2 Answers

0
votes

Did you check if some disk failure caused the SSTables to get corrupted? That is one of the main reason for stable corruption. If it is the case repair the disk and then run nodetool repair.

0
votes

Modify cassandra policy in cassandra.yaml on failed nodes.

1) disk failure policy to best_effort 2) Start DSE start or (Cassandra service) 3) nodetool scrub