HBase rowkey which includes timestamp

Question

I would like to whether it is bad to have rowkeys like the following:

username-timestamp

This rows would be read from MapReduce jobs and will be put using java client API. Also, a subset would be selected using STARTROW, ENDROW.

On one side this seems convinient for my usecase since I can scan for specific interval and rows arebmostly subsequent for MR job, while on the other I read that it is good to avoid long rowkeys and hotspoting.

Is there really a problem with this design and how to overcome it?

I'm new to HBase so any help would be great.

Hellmar Becker Hellmar Becker · Accepted Answer · 2015-11-24T07:59:36

The general advice is to avoid monotonically increasing row keys. To that purpose, some software tools include a so called "salt" to the row key, which hashes the keys across regions. A discussion can be found here: http://hbase.apache.org/0.94/book/rowkey.design.html. And here: https://phoenix.apache.org/salted.html. You can also look at Apache Trafodion http://trafodion.apache.org/, which uses row key salting to distribute SQL-like primary keys.

HBase rowkey which includes timestamp

1 Answers