I would like to have a recommendation on designing the Hbase table/RowKey for efficient search
Here, is a sample data set:
| Column Families | Column Qualifiers | Row 1 | Row 2 | Row 3 |
---------------------------------------------------------------------------------------------------------
| Country | Code | US | | UK |
|
| | Full Name |United States of America| |United Kingdom |
| | Capital |Washington, DC | |London |
| | Leader | President | President |Prime minister |
|State | | Texas | | |
|District | |Houston |LA | |
|County | |Harris |Harris | Cambridge |
|City | |Houston city |Duke City | |
|Road | |Bellare | |Downing Street |
|Family | |Doe |Wade | |
|Person | Name |John Doe |Doo |Smith |
| | Location |35.00 N, 99.00 W |31.00 N , 100.00 W | |
| | Gender | |Female |Male |
| | Religion |Atheist |Maya |Christian |
note: In the above sample data set the column qualifiers are detailed only for the first and the last column family to avoid data cluttering.
Data overview: Its a hierarchical data set but the information at any level can be missing Example, Row 2 doesn't have the Country information Row 3, UK doesn't have the concept of State, so the State hierarchy will be missing for UK records
The requirement is to search for following scenarios:
|Search Criteria |Returned Records |
-----------------------------------------------------------------
|All records |3 count, Row1, Row2 & Row3 |
|Country = USA |1 Record, Row 1 |
|Gender = Male |1 Record, Row 3 |
|County = Harris |2 records, Row 1 & Row 2 |
|LATITUDE > 30 and LONGITUDE < 101 |2 records, Row 1 & Row 2 |
|All Atheist for USA |1 record, Row1 |
Proposed design solution:
1. Create eight column families for each level since there can be added information which needs to be searched at each level (example, the leaders name, position, timezone, area for level city, county, district, country etc.)
|Level one |Country |
|Level two |State |
|Level three |District |
|Level four |County |
|Level Five |City |
|Level Six |Road |
|Level Seven |Family |
|Level Eight |Person |
- Design the Rowkey as a composite key, a combination of all level codes since the search can be at any level Level1Code:Level2Code:Level3Code:Level4Code:Level5Code:Level6Code:Level7Code:Level8Code: example, US:101:102:103:104:105:106:107
Other alternative I am thinking is to create a secondary index but will have to explore it further on performance end since I plan to use Hbase as a backend for my web application.
Thanks in advance for sharing your expertise!