We have a json data stored under a single column family and this has several name/value pairs. We query this data with different name/value combinations and these queries do not particularly incline towards any name/value pairs (which makes it difficult to break them into column families).
- What would be the best way to improve the performance of these queries? Would some thing like secondary indexes or impala or pheonix help?
- Would it help to divide them into multiple column families? Considering hbase works best for 2 or 3 column families, not sure if this is the right thing to do.
- What would be a good system to store nested data or json data to achieve good query performance? Would something like apache drill help?