1
votes

I would like to store an object (payload) along with some metadata in HBase.

Then I would like to run queries on the table and pull out the payload part based on metadata info.

For example, let's say I have the following column qualifiers

  • P: Payload (larger than M1 + M2).
  • M1: Meta-Data1
  • M2: Meta-Data2

Then I would run a query such as:

  • Fetch all Payload where M1='search-key1' && M2='search-key2'

Does it make sense to:

  1. keep M1 and M2 in one column family and P in another column family? Would the scan be quicker?
  2. Keep all 3 columns in the same column family?

Normally, I would do a spike (I may still need to) - I thought I ask first.

1
what is your row key?h a
@hatefAlipoor not sure how the rowkey plays into this...but P, M1 & M2 share the same rowkyehba

1 Answers

0
votes

I'd try to follow the advice given in HBase Reference and go with option #2 (Keep all 3 col in the same column family):

Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.