0
votes

I am fairly new to Cassandra - within the month, having come from a long SQL Server background. I have been tasked with stubbing out some Python to automate bulk loading of sstables. Enter sstableloader. Everything I have installed so far is for testing. I have 1 virtual machine set up with Cassandra installed on a single-node cluster. This required a bit of setup and a loopback ipaddress. So I have 127.0.0.1 and 127.0.0.2, seed set up at 127.0.0.1. I successfully got Cassandra up and running, and can access it via simple connection strings in Python from other boxes - so most of my requirements are met. Where I am running into problems is loading data in via anything but cql. I can use insert statements to get data in all day -- what I need to successfully do is run json2sstable and sstableloader (separately at this point) successfully. The kicker is it reports back that everything is fine... and my data never shows up in either case. The following is my way to recreate the issue.

Keyspace, column family and folder: sampledb_adl, emp_new_9 /var/lib/cassandra/data/emp_new_9

Table created at cqlsh prompt: CREATE TABLE emp_new_9 (pkreq uuid, empid int, deptid int, first_name text, last_name text, PRIMARY KEY     ((pkreq)))   WITH
  bloom_filter_fp_chance=0.010000 AND 
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Initial data entered into table via cqlsh: INSERT INTO emp_new_9 (pkreq,empid,deptid,first_name,last_name) VALUES (uuid(),30001,235,'yogi','bear');

Results of 'select * from emp_new_9':
pkreq | deptid | empid | first_name | last_name --------------------------------------+--------+-------+------------+----------- 9c6dd9de-f6b1-4312-9737-e9d00b8187f3 | 235 | 30001 | yogi | bear

Initiated nodetool flush

Contents of emp_new_9 folder at this point:

sampledb_adl-emp_new_9-jb-1-CompressionInfo.db  sampledb_adl-emp_new_9-jb-1-Index.db       sampledb_adl-emp_new_9-jb-1-TOC.txt
sampledb_adl-emp_new_9-jb-1-Data.db             sampledb_adl-emp_new_9-jb-1-Statistics.db
sampledb_adl-emp_new_9-jb-1-Filter.db           sampledb_adl-emp_new_9-jb-1-Summary.db

Current results of: [root@localhost emp_new_9]# sstable2json /var/lib/cassandra/data/sampledb_adl/emp_new_9/sampledb_adl-emp_new_9-jb-1-Data.db

[
{"key": "9c6dd9def6b143129737e9d00b8187f3","columns": [["","",1443108919841000], ["deptid","235",1443108919841000],     ["empid","30001",1443108919841000], ["first_name","yogi",1443108919841000], ["last_name","bear",1443108919841000]]}
]

Now to create emp_new_10 with different data:

Keyspace, column family and folder: sampledb_adl, emp_new_10 /var/lib/cassandra/data/emp_new_10

Table created at cqlsh prompt: CREATE TABLE emp_new_10 (pkreq uuid, empid int, deptid int, first_name text, last_name text, PRIMARY KEY     ((pkreq)))  WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Initial data entered into table via cqlsh: INSERT INTO emp_new_10 (pkreq,empid,deptid,first_name,last_name) VALUES (uuid(),30101,298,'scoobie','doo');

Results of 'select * from emp_new_10':

pkreq | deptid | empid | first_name | last_name --------------------------------------+--------+-------+------------+----------- c0e1763d-8b2b-4593-9daf-af3596ed08be | 298 | 30101 | scoobie | doo

Initiated nodetool flush

Contents of emp_new_10 folder at this point:

sampledb_adl-emp_new_10-jb-1-CompressionInfo.db  sampledb_adl-emp_new_10-jb-1-Index.db       sampledb_adl-emp_new_10-jb-1-TOC.txt
sampledb_adl-emp_new_10-jb-1-Data.db             sampledb_adl-emp_new_10-jb-1-Statistics.db
sampledb_adl-emp_new_10-jb-1-Filter.db           sampledb_adl-emp_new_10-jb-1-Summary.db

Current results of: [root@localhost emp_new_10]# sstable2json /var/lib/cassandra/data/sampledb_adl/emp_new_10/sampledb_adl-emp_new_10-jb-1-Data.db

[
{"key": "c0e1763d8b2b45939dafaf3596ed08be","columns": [["","",1443109509458000], ["deptid","298",1443109509458000],     ["empid","30101",1443109509458000], ["first_name","scoobie",1443109509458000], ["last_name","doo",1443109509458000]]}
]

So, yogi 9, scoobie 10.

Now I am going to try first to use json2sstable with the file from emp_new_10 which I named (original, I know): emp_new_10.json

json2sstable -K sampledb_adl -c emp_new_9 /home/tdmcoe_admin/Desktop/emp_new_10.json /var/lib/cassandra/data/sampledb_adl/emp_new_10/sampledb_adl-emp_new_10-jb-1-Data.db 

Results printed to terminal window:

ERROR 08:56:48,581 Unable to initialize MemoryMeter (jamm not specified as javaagent).  This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Importing 1 keys...
1 keys imported successfully.

I get the MemoryMeter error all the time and ignore as googling said it didn't affect results.

SO, my folder contents have not changed, 'select * from emp_new_9;' still gives the same single original record result. emp_new_10 has not changed, either. What the heck happened to my '1 keys imported successfully'? Successfully where?

Now for the related sstableloader. Same base folders/data, but now running sstableloader:

[root@localhost emp_new_10]# sstableloader -d 127.0.0.1 /var/lib/cassandra/data/sampledb_adl/emp_new_9

NOTE: I ALSO RAN THE LINE ABOVE WITH 127.0.0.2, and with 127.0.0.1,127.0.0.2 just in case, but same results.

Results printed to terminal window:

ERROR 09:05:07,686 Unable to initialize MemoryMeter (jamm not specified as javaagent).  This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /var/lib/cassandra/data/sampledb_adl/emp_new_9/sampledb_adl-emp_new_9-jb-1-Data.db to [/<my machine ip>]
Streaming session ID: 06a9c1a0-62d6-11e5-b85d-597b365ae56f
progress: [/<my machine ip> 1/1 (100%)] [total: 100% - 0MB/s (avg: 0MB/s)]

So - 100% - yay! 0MB/s boo!

Now for the contents of emp_new_9 folder, which I have not touched now have a second set of files:

sampledb_adl-emp_new_9-jb-1-CompressionInfo.db  sampledb_adl-emp_new_9-jb-1-TOC.txt             sampledb_adl-emp_new_9-jb-2-Statistics.db
sampledb_adl-emp_new_9-jb-1-Data.db             sampledb_adl-emp_new_9-jb-2-CompressionInfo.db  sampledb_adl-emp_new_9-jb-2-Summary.db
sampledb_adl-emp_new_9-jb-1-Filter.db           sampledb_adl-emp_new_9-jb-2-Data.db             sampledb_adl-emp_new_9-jb-2-TOC.txt
sampledb_adl-emp_new_9-jb-1-Index.db            sampledb_adl-emp_new_9-jb-2-Filter.db
sampledb_adl-emp_new_9-jb-1-Statistics.db       sampledb_adl-emp_new_9-jb-2-Index.db

Results of 'select * from emp_new_9;' have not changed, using sstable2json on BOTH of the data files also just show the 1 old yogi entry. When I run nodetool compact it goes back down to 1 set of files with only the 1 yogi line. So what 100% happened?!? 100% of what?

Any help is appreciated. I am very confused.

2

2 Answers

0
votes

When using json2sstable, you should specify the name of a new non-existant .db file. As designed, SSTables are immutable so will not allow them to be updated through json2sstable.

For whatever reason, the tool doesn't complain about an existing SSTable. If you specify a new .db file, you will find that the SSTable files will be created with what you expect.

0
votes

I figured this out - I was using a table with a uuid field, and trying to add in a table for bulk loading that already had a uuid in that field, so it was failing. Testing with text columns and everything worked fine!