2
votes

I would like to whether it is bad to have rowkeys like the following:

username-timestamp

This rows would be read from MapReduce jobs and will be put using java client API. Also, a subset would be selected using STARTROW, ENDROW.

On one side this seems convinient for my usecase since I can scan for specific interval and rows arebmostly subsequent for MR job, while on the other I read that it is good to avoid long rowkeys and hotspoting.

Is there really a problem with this design and how to overcome it?

I'm new to HBase so any help would be great.

1

1 Answers

2
votes

The general advice is to avoid monotonically increasing row keys. To that purpose, some software tools include a so called "salt" to the row key, which hashes the keys across regions. A discussion can be found here: http://hbase.apache.org/0.94/book/rowkey.design.html. And here: https://phoenix.apache.org/salted.html. You can also look at Apache Trafodion http://trafodion.apache.org/, which uses row key salting to distribute SQL-like primary keys.