I have about ~20-30ish columns that I would need to store in my column family in total. However, my data comes in different variations. I have different objects that belong together logically but are not having the same fields (fields as in key names). Sometimes, 5 fields are provided, sometimes 7 fields and so on. All of them share a portion of fields that are always provided though.
A row I insert in this column family will never have all of the columns filled. When using a Map, I could add key/values based on the object type and will not have the possible overhead that is introduced by my other model.
I am concerned about having a lot of empty columns in each row.
A possible downside of using a Map is that you can't have an index for map keys and map values coexist.
Questions gathered:
- Do you suggest me to use a Map or just add all of the columns I may need to my column family?
- I assume that querying the data based on keys/values in the Map is way slower than "directly" accessing them from the columns. Is this correct?
- What downsides are there when I have a lot of empty columns for each row? Overhead?
- Is it possible to have a "generic" value type when using a Map? I want to store different data, mostly Strings but also Floats and Integers. Do I need to use a
map<text,text>
and cast the values within my application?
I am using Cassandra 3.0.8 | CQL spec 3.4.0 | Native protocol v4
Thanks