1
votes

I am ingesting 20 GB of data with the most recent C* java driver. I have a 5 node C* cluster. The client application that is ingesting data is running on a local node that is not part of the C* cluster (but part of the same LAN). I am also using a cassandra-lucene-index index on the table I ingest to. [cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.4.0 | Native protocol v4]

My java client application works the following way:

  1. json file is parsed
  2. one PreparedStatement will be build
  3. one BoundStatement for every element in my file is sent to the C* cluster with executeAsync [note: there are a lot of rows being sent]

About half way through, two of my nodes died and /var/log/cassandra/system.log shows

    ERROR [CompactionExecutor:1] JVMStabilityInspector.java:140 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:693) ~[na:1.8.0_91]
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.8.0_91]
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_91]
        at org.apache.cassandra.utils.memory.BufferPool.allocate(BufferPool.java:108) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.memory.BufferPool.access$1000(BufferPool.java:45) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.memory.BufferPool$LocalPool.allocate(BufferPool.java:387) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:314) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.memory.BufferPool.takeFromPool(BufferPool.java:120) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.memory.BufferPool.get(BufferPool.java:92) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.util.RandomAccessReader.allocateBuffer(RandomAccessReader.java:87) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.access$100(CompressedRandomAccessReader.java:38) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader$Builder.createBuffer(CompressedRandomAccessReader.java:275) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:74) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:59) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.compress.CompressedRandomAccessReader$Builder.build(CompressedRandomAccessReader.java:283) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.util.CompressedSegmentedFile.createReader(CompressedSegmentedFile.java:145) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.util.SegmentedFile.createReader(SegmentedFile.java:133) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.sstable.format.SSTableReader.getFileDataInput(SSTableReader.java:1711) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:93) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:46) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:36) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:62) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:580) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:492) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at com.stratio.cassandra.lucene.IndexService.read(IndexService.java:618) ~[cassandra-lucene-index-plugin-3.0.8.0.jar:na]
        at com.stratio.cassandra.lucene.IndexWriterWide.finish(IndexWriterWide.java:89) ~[cassandra-lucene-index-plugin-3.0.8.0.jar:na]
        at org.apache.cassandra.index.SecondaryIndexManager$IndexGCTransaction.commit(SecondaryIndexManager.java:958) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.CompactionIterator$1$1.onMergedRows(CompactionIterator.java:197) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:484) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:446) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:428) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:288) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:128) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:111) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:149) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:263) ~[apache-cassandra-3.0.8.jar:3.0.8]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_91]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_91]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]

The same exception occurs when I tried to restart the two nodes. I then decided to adjust MAX_HEAP_SIZE and HEAP_NEWSIZE in cassandra-env.sh but the nodes will just drain all memory available and then die again. I have a 82gb ram node and one 48gb ram that died. I tried values 4G, 8G, 24G for MAX_HEAP_SIZE and 800M, 2G for HEAP_NEWSIZE but they can't restart.

The system.log file also shows the dump of the heap:

Heap dump file created

INFO  [CompactionExecutor:1] 2016-08-14 13:43:44,223 HeapUtils.java:136 -
 num     #instances         #bytes  class name
----------------------------------------------
   1:         20879     2883199896  [I
   2:       5228765      334640960  org.apache.cassandra.utils.btree.BTreeSearchIterator
   3:       1836788      196567272  [B
   4:       3625993      163640704  [Ljava.lang.Object;
   5:       1260125       60486000  java.nio.HeapByteBuffer
   6:       1686322       53962304  org.apache.cassandra.utils.MergeIterator$Candidate
   7:        412664       52984560  [C
   8:        965525       38621000  org.apache.cassandra.db.rows.BufferCell
   9:       1511370       36272880  java.util.ArrayList
  10:        768459       30738360  org.apache.cassandra.db.rows.BTreeRow$Builder
  11:        382981       30638480  org.apache.cassandra.io.compress.CompressedRandomAccessReader
  12:        848641       27156512  java.util.RandomAccessSubList
  13:        848467       27150944  java.util.AbstractList$ListItr
  14:        394468       25245952  java.nio.DirectByteBuffer
  15:        387495       24799680  java.nio.DirectByteBufferR
  16:        774559       24785888  org.apache.cassandra.utils.btree.BTree$Builder
  17:        382823       24500672  org.apache.cassandra.db.columniterator.SSTableIterator$ForwardIndexedReader
  18:        486500       23352000  org.apache.cassandra.db.columniterator.SSTableIterator
  19:        383067       21451752  org.apache.cassandra.db.ClusteringPrefix$Deserializer
  20:        667956       21374592  org.apache.cassandra.db.rows.BTreeRow
  21:        853186       20476464  java.util.Arrays$ArrayList
  22:        848462       20363088  java.util.SubList$1
  23:        486689       19467560  org.apache.cassandra.db.rows.SerializationHelper
  24:        486319       19452760  org.apache.cassandra.db.filter.ClusteringIndexNamesFilter$1
  25:        383067       18387216  org.apache.cassandra.db.UnfilteredDeserializer$CurrentDeserializer
  26:        404602       16184080  org.apache.cassandra.utils.MergeIterator$ManyToOne
  27:        487237       15591584  java.util.TreeMap$KeyIterator
  28:        382823       15312920  org.apache.cassandra.db.columniterator.AbstractSSTableIterator$IndexState
  29:        376176       15047040  sun.misc.Cleaner
  30:        469635       15028320  org.apache.cassandra.db.rows.EncodingStats
  31:        464757       14872224  com.google.common.collect.Iterators$11
  32:        404602       13837016  [Lorg.apache.cassandra.utils.MergeIterator$Candidate;
  33:        550947       13222728  java.lang.Long
  34:        404502       12598432  [Lorg.apache.cassandra.db.rows.Row;
  35:        768640       12298240  org.apache.cassandra.db.rows.BTreeRow$Builder$CellResolver
  36:        375337       12010784  java.nio.DirectByteBuffer$Deallocator
  37:        250019       10000760  org.apache.cassandra.db.rows.Row$Merger
  38:        250019       10000760  org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer
  39:        312161        9989152  com.google.common.collect.RegularImmutableList
  40:        154609        9894976  org.apache.cassandra.db.SinglePartitionReadCommand
  41:        383092        9194208  org.apache.cassandra.cache.KeyCacheKey
  42:        275673        8821536  java.util.ArrayList$Itr
  43:        155947        8733032  java.util.LinkedHashMap
  44:        344055        8257320  java.lang.Double
  45:        157837        7576176  java.util.TreeMap
  46:        314112        7538688  com.google.common.collect.Iterators$12
  47:        302346        7256304  com.google.common.collect.Collections2$TransformedCollection
  48:        125865        7048440  java.util.stream.ReferencePipeline$Head
  49:        125057        7003192  org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator
  50:        154483        6179320  com.stratio.cassandra.lucene.IndexWriterWide
  51:        383788        6140608  java.util.zip.CRC32
  52:        377473        6039568  java.lang.Integer
  53:        250019        6000456  org.apache.cassandra.db.rows.Row$Merger$CellReducer
  54:        236029        5844680  [Ljava.nio.ByteBuffer;
  55:        231113        5546712  java.lang.String
  56:        230392        5529408  org.apache.cassandra.db.Clustering
  57:        125057        5002280  org.apache.cassandra.db.rows.RangeTombstoneMarker$Merger
  58:        154949        4958368  org.apache.cassandra.db.filter.ColumnFilter
  59:        309592        4953472  com.google.common.collect.Iterables$3
  60:        154483        4943456  [Lorg.apache.cassandra.db.rows.ColumnData;
  61:        154483        4943456  [Lorg.apache.cassandra.db.rows.Row$Builder;
  62:        154483        4943456  org.apache.cassandra.db.rows.Rows$1
  63:        179483        4307592  com.google.common.collect.Iterators$5
  64:          8055        4213168  [J
  65:        125114        4003648  java.util.ArrayList$ArrayListSpliterator
  66:        125057        4001824  org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer
  67:        125060        3828904  [Lorg.apache.cassandra.db.DeletionTime;
  68:        125057        3828808  [Lorg.apache.cassandra.db.rows.RangeTombstoneMarker;
  69:        157055        3769320  com.google.common.collect.Iterables$2
  70:        155783        3738792  org.apache.cassandra.utils.Interval
  71:        154722        3713328  org.apache.cassandra.db.rows.ComplexColumnData
  72:        154677        3712248  org.apache.cassandra.db.filter.ClusteringIndexNamesFilter
  73:        154483        3707592  org.apache.cassandra.index.SecondaryIndexManager$IndexGCTransaction$1
  74:        150356        3608544  org.apache.cassandra.db.Columns
  75:         70425        3380400  [Lorg.apache.cassandra.db.ClusteringPrefix$Kind;
  76:        124989        2999736  org.apache.cassandra.db.rows.ComplexColumnData$Builder
  77:        124971        2999304  java.util.stream.MatchOps$$Lambda$105/1652649396
  78:        124971        2999304  java.util.stream.MatchOps$1MatchSink
  79:        124971        2999304  java.util.stream.MatchOps$MatchOp
  80:        173871        2781936  org.apache.cassandra.db.rows.CellPath$CollectionCellPath
  81:        155081        2481296  java.util.TreeMap$KeySet
  82:        154648        2474368  java.util.TreeSet
  83:        154482        2471712  com.stratio.cassandra.lucene.IndexWriterWide$$Lambda$146/22703726
  84:         16388        2228768  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$PaddedAtomicReference
  85:         54841        1754912  java.util.HashMap$Node
  86:         36123        1444920  sun.nio.cs.UTF_8$Decoder
  87:         35748        1429920  org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
  88:         34554        1382160  java.util.TreeMap$Entry
  89:         11385        1126968  [Ljava.util.HashMap$Node;
  90:         18124        1014944  sun.nio.cs.UTF_8$Encoder
  91:         41388         993312  org.apache.cassandra.io.compress.CompressionMetadata$Chunk
  92:         37722         905328  java.lang.StringBuilder
  93:          7306         814736  java.lang.Class
  94:          8627         753872  [S
  95:         15574         747552  java.util.HashMap
  96:         14365         603008  [Ljava.lang.String;
  97:         23530         564720  java.util.EnumMap$EntryIterator$Entry
  98:         17382         556224  java.util.concurrent.ConcurrentHashMap$Node
  99:         22728         545472  org.apache.cassandra.db.ColumnFamilyStore$ViewFragment
 100:          6057         533016  java.lang.reflect.Method
 101:          8908         498848  org.apache.cassandra.utils.memory.BufferPool$Chunk
 102:         12317         450160  [Ljavax.management.ObjectName$Property;
 103:         18124         434976  javax.management.ObjectName$Property
 104:         10779         431160  java.util.LinkedHashMap$Entry
 105:         16300         391200  org.apache.cassandra.utils.btree.BTree$$Lambda$19/816798571
 106:          7827         375696  java.nio.HeapCharBuffer
 107:          9190         367600  java.lang.ref.Finalizer
 108:         16672         364336  [Ljava.lang.Class;
 109:          7538         361824  org.apache.cassandra.io.compress.CompressedRandomAccessReader$Builder
 110:         20623         329968  org.apache.cassandra.db.lifecycle.View$$Lambda$46/610085529
 111:          7491         299640  java.util.HashMap$KeyIterator
 112:          4551         291264  java.net.URL
 113:          9087         290784  sun.nio.fs.UnixPath
 114:           517         272976  [Lorg.apache.lucene.util.fst.FST$Arc;
 115:          7784         249088  java.io.File
 116:          3366         242352  java.lang.reflect.Field
 117:          3783         242112  java.util.regex.Matcher
 118:          4323         242088  java.util.zip.ZipFile$ZipFileInputStream
 119:          4322         242032  java.util.zip.ZipFile$ZipFileInflaterInputStream
 120:          9545         229080  java.util.concurrent.ConcurrentLinkedQueue$Node
 121:          9525         228600  com.google.common.collect.Iterators$8
 122:         14037         224592  java.lang.Object
 123:          3456         221184  io.netty.buffer.PoolSubpage
 124:          8942         214608  java.util.concurrent.CopyOnWriteArrayList$COWIterator
 125:          6702         214464  java.lang.StackTraceElement
 126:          2917         210024  org.apache.lucene.util.fst.FST$Arc
 127:          5119         204760  javax.management.MBeanPermission
 128:          6281         200992  java.io.FilePermission
 129:          4867         194680  javax.management.ObjectName
 130:          7807         187368  org.apache.cassandra.dht.Murmur3Partitioner$LongToken
 131:          7603         182472  org.apache.cassandra.io.util.MmappedRegions$Region
 132:          5624         179968  org.apache.cassandra.cql3.ColumnIdentifier
 133:          4330         173200  java.util.HashMap$ValueIterator
 134:          2901         162456  jdk.internal.org.objectweb.asm.Item
 135:          6604         158496  [Lorg.apache.cassandra.dht.Range;
 136:          6506         156152  [Ljava.security.ProtectionDomain;
 137:          1508         144768  java.util.jar.JarFile$JarFileEntry
 138:           323         140560  [Ljava.util.concurrent.ConcurrentHashMap$Node;
 139:          2143         137152  org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl
 140:          5708         136992  org.apache.cassandra.dht.Token$KeyBound
 141:          4196         134272  org.apache.cassandra.db.rows.BTreeRow$$Lambda$140/1296076467
 142:          5445         130680  com.google.common.collect.Iterators$PeekingImpl
 143:          3990         127680  com.codahale.metrics.LongAdder
 144:          2273         127288  org.apache.cassandra.db.RowUpdateBuilder
 145:          1460         116800  java.util.zip.ZipEntry
 146:          4843         116232  java.util.concurrent.ConcurrentSkipListMap$Node
 147:          4837         116088  com.sun.jmx.mbeanserver.StandardMBeanSupport
 148:          3616         115712  java.util.Collections$UnmodifiableMap
 149:          4514         108336  org.apache.lucene.util.BytesRef
 150:          1931         108136  javax.management.MBeanServerNotification
 151:          2654         106160  java.security.AccessControlContext
 152:          4320         103680  com.sun.jmx.mbeanserver.NamedObject
 153:          4311         103464  org.apache.cassandra.utils.btree.BTree$FiltrationTracker
 154:          4295         103080  org.apache.cassandra.utils.Pair
 155:          6259         100144  java.io.FilePermission$1
 156:          2044          98112  org.apache.lucene.index.FieldInfo
 157:          2003          96144  org.antlr.runtime.CommonToken
 158:           106          95840  [Ljdk.internal.org.objectweb.asm.Item;
 159:          2902          92864  org.apache.cassandra.db.CBuilder$ArrayBackedBuilder
 160:          1022          89936  org.apache.lucene.codecs.blocktree.FieldReader
 161:          2779          88928  java.util.EnumMap$EntryIterator
 162:          5557          88912  [Lorg.apache.cassandra.db.Clustering;
 163:          2197          87880  org.apache.cassandra.utils.concurrent.Ref$State
 164:           908          87168  org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$NumericEntry
 165:          1785          85680  com.codahale.metrics.EWMA
 166:          2063          82520  java.util.WeakHashMap$Entry
 167:          2470          81824  [Ljava.lang.reflect.Method;
 168:          2017          80680  java.util.HashMap$EntryIterator
 169:          3226          77424  java.io.ExpiringCache$Entry
 170:           875          77416  [Ljava.lang.StackTraceElement;
 171:           455          77408  [Z
 172:          2412          77184  java.io.FileDescriptor
 173:          3216          77184  java.util.concurrent.ConcurrentLinkedDeque$Node
 174:          2300          73600  com.google.common.collect.Iterators$7
 175:          1022          73584  org.apache.lucene.util.fst.FST
 176:          2283          73056  org.apache.lucene.util.WeakIdentityMap$IdentityWeakReference
 177:          3036          72864  org.apache.cassandra.db.LivenessInfo
 178:          2857          71792  [Lorg.yaml.snakeyaml.tokens.Token$ID;
 179:          1774          70960  java.util.Formatter$FormatSpecifier
 180:          1102          70528  sun.nio.ch.FileChannelImpl
 181:          4393          70288  java.util.HashMap$KeySet
 182:          2189          70048  org.apache.cassandra.metrics.CassandraMetricsRegistry$MetricName
 183:          2886          69264  java.net.URLClassLoader$1
 184:          2871          68904  org.apache.cassandra.db.DeletionTime
 185:          2853          68472  org.apache.cassandra.dht.Range
 186:           164          68224  [Lorg.antlr.runtime.BitSet;
 187:          1694          67760  java.lang.ref.SoftReference
 188:           128          67584  [Lcom.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$PaddedAtomicReference;
 189:          4196          67136  org.apache.cassandra.db.Columns$$Lambda$139/703415756
 190:          4175          66800  com.google.common.collect.Iterators$3
 191:          3938          63008  java.util.HashSet
 192:          2599          62376  com.google.common.collect.ImmutableEntry
 193:          1526          61040  org.apache.cassandra.db.PreHashedDecoratedKey
 194:          1891          60512  java.net.InetAddress$InetAddressHolder
 195:          3780          60480  com.stratio.cassandra.lucene.IndexWriterWide$$Lambda$149/2109187587
 196:           924          59136  java.util.concurrent.ConcurrentHashMap
 197:          1457          58280  java.lang.ClassNotFoundException
 198:          2411          57864  java.lang.StringBuffer
 199:          1797          57504  java.util.TreeMap$EntryIterator
 200:          1412          56480  java.util.EnumMap
 201:          2349          56376  java.util.Date
 202:          1563          56072  [Ljava.util.Formatter$Flags;
 203:          1390          55600  java.util.ArrayList$SubList$1
 204:           868          55552  java.text.DecimalFormatSymbols
 205:          1384          55360  org.yaml.snakeyaml.error.Mark
 206:           686          54880  java.lang.reflect.Constructor
 207:          1361          54440  java.util.ArrayList$SubList
 208:          2267          54408  org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxGauge
 209:          2238          53712  sun.misc.ProxyGenerator$ConstantPool$IndirectEntry
 210:          2197          52728  org.apache.cassandra.utils.concurrent.Ref
 211:          2172          52128  java.util.Collections$1
 212:          1615          51680  java.util.concurrent.locks.AbstractQueuedSynchronizer$Node
 213:          2103          50472  org.apache.cassandra.db.lifecycle.View$$Lambda$66/2068475002
 214:          3151          50416  java.util.HashMap$Values
 215:           134          50384  java.lang.Thread
 216:           272          50048  [Lorg.apache.cassandra.net.MessagingService$Verb;
 217:          2042          49008  javax.management.ObjectInstance
 218:          1225          49000  java.lang.invoke.MethodType
 219:            61          48776  [D
 220:          2005          48120  java.util.Formatter$FixedString
 221:          1490          47680  java.security.CodeSource
 222:           569          47184  [Ljava.util.WeakHashMap$Entry;
 223:           208          46592  jdk.internal.org.objectweb.asm.MethodWriter
 224:           827          46312  java.lang.invoke.MemberName
 225:          1447          46304  sun.misc.URLClassPath$JarLoader$2
 226:          1916          45984  java.util.concurrent.atomic.AtomicLong
 227:          1896          45504  java.lang.Class$MethodArray
 228:          1886          45264  java.net.Inet4Address
 229:          1845          44280  javax.management.MBeanTrustPermission
 230:          1830          43920  [Lorg.apache.lucene.util.packed.PackedInts$Format;
 231:          1339          42848  java.lang.invoke.MethodType$ConcurrentWeakInternSet$WeakEntry
 232:           655          41920  java.util.stream.ReferencePipeline$3
 233:          1734          41616  com.google.common.collect.Iterables$6
 234:          2578          41248  com.sun.jmx.interceptor.DefaultMBeanServerInterceptor$1
 235:          2560          40960  com.google.common.collect.Maps$6
 236:          1022          40880  org.apache.lucene.util.fst.BytesStore
 237:           842          40416  org.apache.cassandra.metrics.CassandraMetricsRegistry$JmxTimer
 238:          1253          40096  java.lang.ref.ReferenceQueue
 239:           703          39368  java.util.concurrent.ConcurrentHashMap$EntryIterator
 240:           264          38016  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$PaddedAtomicLong
 241:          1539          36936  org.apache.lucene.util.fst.ReverseBytesReader
 242:           655          36680  java.lang.Class$ReflectionData
 243:           648          36288  java.util.concurrent.ConcurrentHashMap$KeyIterator
 244:          2111          33776  java.util.concurrent.atomic.AtomicInteger
 245:           421          33680  org.apache.cassandra.db.rows.RowAndDeletionMergeIterator
 246:          2092          33472  org.apache.cassandra.utils.concurrent.Refs
 247:          1022          32704  org.apache.lucene.store.ByteArrayDataInput
 248:          1358          32592  org.apache.cassandra.gms.GossipDigest
 249:           789          31560  sun.nio.ch.FileChannelImpl$Unmapper
 250:           656          31488  java.util.StringTokenizer
...
3851:             1             16  sun.util.locale.provider.SPILocaleProviderAdapter
3852:             1             16  sun.util.resources.LocaleData
3853:             1             16  sun.util.resources.LocaleData$LocaleDataResourceBundleControl
Total      41014224     4650350712
  1. How can I make them start again? I feel like they try to re-build some inserts that were queued due to the fact that I just "spam" the inserts with writeAsync from my client and cassandra was overwhelmed by the amount of inserts eventually?

  2. How can I prevent this from happening the next time I ingest data?

1
Does it still OOM without the indexes?phact
@phact I thought about removing the index and re-run my data ingestion too. However, I am currently unable to start 2 out of 5 nodes due to this error. How can I make them able to restart? My folder /var/lib/cassandra/data/*mykeyspace*/ only contains folders named "mytable-SOME_ID_HERE". Is there anything else I need to delete / clean up before being able to restart?j9dy
Is it blowing up on commitlog replay? May want to mv that data away for now and see if it lets you come up.phact
@phact I have deleted all the subfolders of /var/lib/cassandra/d‌​ata/*mykeyspace*/ on the nodes that failed. They are now able to restart. I have dropped the index and recreated the entire table. Will now launch the ingestion again and report back.j9dy
@phact after removing the index, the importer has finished without any severe errors from cassandra. (except for one nodes hard drive dying mid-night, but i don't think that's related lol). I guess I should stop using executeAsync and use execute instead? The system.log is filled with "MUTATION messages were dropped..." logs because the hard drive could not keep up with the ingestion speed. This only happened on the node where the hard drive died afterwards.j9dy

1 Answers

1
votes

This is caused by using up too much memory for offheap buffers as older versions of Java (prior to 1.8.102 and 1.9.104) had no limit for that one: https://support.datastax.com/hc/en-us/articles/360000863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache

Try upgrading to newer Java and adding:

-Djdk.nio.maxCachedBufferSize=1048576

to cassandra-env.sh or jvm.options.