I am trying to import a few large .csv files into HBase (>1TB in sum). The data looks like a dump from a relational DB, but does not have a UID. Also I do not want to import all columns. I decided I need to run a custom MapReduce job first to get them into the required format (select columns + generate UID) so that I can import them using the standard hbase importtsv bulk import.
My question: Can I just create my own composite row key, say storeID:year:UID using MapReduce and then feed it to the tsv import? So say, my data looks like this:
row_key | price | quantity | item_id
A:2012:1| 0.99 | 1 | 001
A:2012:2| 0.99 | 2 | 012
B:2013:1| 0.99 | 1 | 004
From what I understand, HBase stores everything as bytes, except for timestamps. Is it going to understand this is a composite key?!
Any hints are appreciated!