NOTE : Its a few hours ago that I have begun HBase and I come from an RDBMS background :P
I have a RDBMS-like table CUSTOMERS having the following columns:
- CUSTOMER_ID STRING
- CUSTOMER_NAME STRING
- CUSTOMER_EMAIL STRING
- CUSTOMER_ADDRESS STRING
- CUSTOMER_MOBILE STRING
I have thought of the following HBase equivalent :
table : CUSTOMERS rowkey : CUSTOMER_ID
column family : CUSTOMER_INFO
columns : NAME EMAIL ADDRESS MOBILE
From whatever I have read, a primary key in an RDBMS table is roughly similar to a HBase table's rowkey. Accordingly, I want to keep CUSTOMER_ID as the rowkey.
My questions are dumb and straightforward :
- Irrespective of whether I use a shell command or the HBaseAdmin java class, how do I define the rowkey? I didn't find anything to do it either in the shell or in the HBaseAdmin class(some thing like HBaseAdmin.createSuperKey(...))
- Given a HBase table, how to determine the rowkey details i.e which are the values used as rowkey?
- I understand that rowkey design is a critical thing. Suppose a customer id is receives values like CUST_12345, CUST_34434 and so on, how will HBase use the rowkey to decide in which region do particular rows reside(assuming that region concept is similar to DB horizontal partitioning)?
***Edited to add sample code snippet
I'm simply trying to create one row for the customer table using 'put' in the shell. I did this :
hbase(main):011:0> put 'CUSTOMERS', 'CUSTID12345', 'CUSTOMER_INFO:NAME','Omkar Joshi'
0 row(s) in 0.1030 seconds
hbase(main):012:0> scan 'CUSTOMERS'
ROW COLUMN+CELL
CUSTID12345 column=CUSTOMER_INFO:NAME, timestamp=1365600052104, value=Omkar Joshi
1 row(s) in 0.0500 seconds
hbase(main):013:0> put 'CUSTOMERS', 'CUSTID614', 'CUSTOMER_INFO:NAME','Prachi Shah', 'CUSTOMER_INFO:EMAIL','[email protected]'
ERROR: wrong number of arguments (6 for 5)
Here is some help for this command:
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates. To put a cell value into table 't1' at
row 'r1' under column 'c1' marked with the time 'ts1', do:
hbase> put 't1', 'r1', 'c1', 'value', ts1
hbase(main):014:0> put 'CUSTOMERS', 'CUSTID12345', 'CUSTOMER_INFO:EMAIL','[email protected]'
0 row(s) in 0.0160 seconds
hbase(main):015:0>
hbase(main):016:0* scan 'CUSTOMERS'
ROW COLUMN+CELL
CUSTID12345 column=CUSTOMER_INFO:EMAIL, timestamp=1365600369284, [email protected]
CUSTID12345 column=CUSTOMER_INFO:NAME, timestamp=1365600052104, value=Omkar Joshi
1 row(s) in 0.0230 seconds
As put takes max. 5 arguments, I was not able to figure out how to insert the entire row in one put command. This is resulting in incremental versions of the same row which isn't required and I'm not sure if CUSTOMER_ID is being used as a rowkey ! Thanks and regards !