I am fairly new to Cassandra - within the month, having come from a long SQL Server background. I have been tasked with stubbing out some Python to automate bulk loading of sstables. Enter sstableloader. Everything I have installed so far is for testing. I have 1 virtual machine set up with Cassandra installed on a single-node cluster. This required a bit of setup and a loopback ipaddress. So I have 127.0.0.1 and 127.0.0.2, seed set up at 127.0.0.1. I successfully got Cassandra up and running, and can access it via simple connection strings in Python from other boxes - so most of my requirements are met. Where I am running into problems is loading data in via anything but cql. I can use insert statements to get data in all day -- what I need to successfully do is run json2sstable and sstableloader (separately at this point) successfully. The kicker is it reports back that everything is fine... and my data never shows up in either case. The following is my way to recreate the issue.
Keyspace, column family and folder: sampledb_adl, emp_new_9 /var/lib/cassandra/data/emp_new_9
Table created at cqlsh prompt: CREATE TABLE emp_new_9 (pkreq uuid, empid int, deptid int, first_name text, last_name text, PRIMARY KEY ((pkreq))) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Initial data entered into table via cqlsh: INSERT INTO emp_new_9 (pkreq,empid,deptid,first_name,last_name) VALUES (uuid(),30001,235,'yogi','bear');
Results of 'select * from emp_new_9':
pkreq | deptid | empid | first_name | last_name
--------------------------------------+--------+-------+------------+-----------
9c6dd9de-f6b1-4312-9737-e9d00b8187f3 | 235 | 30001 | yogi | bear
Initiated nodetool flush
Contents of emp_new_9 folder at this point:
sampledb_adl-emp_new_9-jb-1-CompressionInfo.db sampledb_adl-emp_new_9-jb-1-Index.db sampledb_adl-emp_new_9-jb-1-TOC.txt
sampledb_adl-emp_new_9-jb-1-Data.db sampledb_adl-emp_new_9-jb-1-Statistics.db
sampledb_adl-emp_new_9-jb-1-Filter.db sampledb_adl-emp_new_9-jb-1-Summary.db
Current results of: [root@localhost emp_new_9]# sstable2json /var/lib/cassandra/data/sampledb_adl/emp_new_9/sampledb_adl-emp_new_9-jb-1-Data.db
[
{"key": "9c6dd9def6b143129737e9d00b8187f3","columns": [["","",1443108919841000], ["deptid","235",1443108919841000], ["empid","30001",1443108919841000], ["first_name","yogi",1443108919841000], ["last_name","bear",1443108919841000]]}
]
Now to create emp_new_10 with different data:
Keyspace, column family and folder: sampledb_adl, emp_new_10 /var/lib/cassandra/data/emp_new_10
Table created at cqlsh prompt: CREATE TABLE emp_new_10 (pkreq uuid, empid int, deptid int, first_name text, last_name text, PRIMARY KEY ((pkreq))) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Initial data entered into table via cqlsh: INSERT INTO emp_new_10 (pkreq,empid,deptid,first_name,last_name) VALUES (uuid(),30101,298,'scoobie','doo');
Results of 'select * from emp_new_10':
pkreq | deptid | empid | first_name | last_name --------------------------------------+--------+-------+------------+----------- c0e1763d-8b2b-4593-9daf-af3596ed08be | 298 | 30101 | scoobie | doo
Initiated nodetool flush
Contents of emp_new_10 folder at this point:
sampledb_adl-emp_new_10-jb-1-CompressionInfo.db sampledb_adl-emp_new_10-jb-1-Index.db sampledb_adl-emp_new_10-jb-1-TOC.txt
sampledb_adl-emp_new_10-jb-1-Data.db sampledb_adl-emp_new_10-jb-1-Statistics.db
sampledb_adl-emp_new_10-jb-1-Filter.db sampledb_adl-emp_new_10-jb-1-Summary.db
Current results of: [root@localhost emp_new_10]# sstable2json /var/lib/cassandra/data/sampledb_adl/emp_new_10/sampledb_adl-emp_new_10-jb-1-Data.db
[
{"key": "c0e1763d8b2b45939dafaf3596ed08be","columns": [["","",1443109509458000], ["deptid","298",1443109509458000], ["empid","30101",1443109509458000], ["first_name","scoobie",1443109509458000], ["last_name","doo",1443109509458000]]}
]
So, yogi 9, scoobie 10.
Now I am going to try first to use json2sstable with the file from emp_new_10 which I named (original, I know): emp_new_10.json
json2sstable -K sampledb_adl -c emp_new_9 /home/tdmcoe_admin/Desktop/emp_new_10.json /var/lib/cassandra/data/sampledb_adl/emp_new_10/sampledb_adl-emp_new_10-jb-1-Data.db
Results printed to terminal window:
ERROR 08:56:48,581 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Importing 1 keys...
1 keys imported successfully.
I get the MemoryMeter error all the time and ignore as googling said it didn't affect results.
SO, my folder contents have not changed, 'select * from emp_new_9;' still gives the same single original record result. emp_new_10 has not changed, either. What the heck happened to my '1 keys imported successfully'? Successfully where?
Now for the related sstableloader. Same base folders/data, but now running sstableloader:
[root@localhost emp_new_10]# sstableloader -d 127.0.0.1 /var/lib/cassandra/data/sampledb_adl/emp_new_9
NOTE: I ALSO RAN THE LINE ABOVE WITH 127.0.0.2, and with 127.0.0.1,127.0.0.2 just in case, but same results.
Results printed to terminal window:
ERROR 09:05:07,686 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /var/lib/cassandra/data/sampledb_adl/emp_new_9/sampledb_adl-emp_new_9-jb-1-Data.db to [/<my machine ip>]
Streaming session ID: 06a9c1a0-62d6-11e5-b85d-597b365ae56f
progress: [/<my machine ip> 1/1 (100%)] [total: 100% - 0MB/s (avg: 0MB/s)]
So - 100% - yay! 0MB/s boo!
Now for the contents of emp_new_9 folder, which I have not touched now have a second set of files:
sampledb_adl-emp_new_9-jb-1-CompressionInfo.db sampledb_adl-emp_new_9-jb-1-TOC.txt sampledb_adl-emp_new_9-jb-2-Statistics.db
sampledb_adl-emp_new_9-jb-1-Data.db sampledb_adl-emp_new_9-jb-2-CompressionInfo.db sampledb_adl-emp_new_9-jb-2-Summary.db
sampledb_adl-emp_new_9-jb-1-Filter.db sampledb_adl-emp_new_9-jb-2-Data.db sampledb_adl-emp_new_9-jb-2-TOC.txt
sampledb_adl-emp_new_9-jb-1-Index.db sampledb_adl-emp_new_9-jb-2-Filter.db
sampledb_adl-emp_new_9-jb-1-Statistics.db sampledb_adl-emp_new_9-jb-2-Index.db
Results of 'select * from emp_new_9;' have not changed, using sstable2json on BOTH of the data files also just show the 1 old yogi entry. When I run nodetool compact it goes back down to 1 set of files with only the 1 yogi line. So what 100% happened?!? 100% of what?
Any help is appreciated. I am very confused.