1
votes

I would like some advice about the HBase schema design. For example, there are 2000 patients, 1. Each patient has a name, sex, age, hospital_ID. 2. Each patient will be recorded activity data such as heart bits, location and steps every minute. 3. Each patient will take several questionnaires.

how to organise the HBase table?

Thank you very much for your help

My current idea is to use the patient_ID as the row key. each patient will have only one row in the HBase table. But, all activity data will be grouped in the nested table. The activity data table will have millions of rows. So, the table will have three column families. CF1:info, CF2:activity_data, CF3:questionnaires.

Then, CF1:info includes (name, sex, age, ID)

CF2:activity_data (data(a nested table))

CF3:questionnaires (questionnaired_ID (a nested table))

I don't know whether this is a smart way to design the HBase schema. Please provide me with some advice.

Thank you very much

1

1 Answers

0
votes
  1. When you design data model it is very important to understand the usage of the data, especially which queries you would like to run efficiently (without full table scan) over data stored in HBase.
  2. activity_data seems to be a raw data, but other two parts related to the "Patient profile". There is a recommendation to keep more or less the same size of column families in the same table. Then probably better to keep activity_data in a different table, then aggregate to let's say daily summary and store the result in the "Patient profile" table.

I hope it was helpful.