HBase scans are slow

Question

Problem

I am trying to build a secondary index with Phoenix. Index creation takes several hours. It seems to be due to slow HBase scans, as I noticed the following performance :

I might need 2 hours to scan the table, whereas other developers reported a few minutes for larger tables (100 millions rows).
HBase shell is able to count rows at an approx. rate of 10.000 per second, which means 3800s (>1 hour!) to count all rows of this table.

Both with HBase shell and a Java scanner.

NB : The GET(by rowkey) operation is achieved with good performances (approx 0.5s).

Context

38 millions rows / 1000 columns / single column family / 96Go with GZ compression.
Cluster has 6 nodes (126Go RAM, 24 cores) with 5 region servers.
Hortonworks Data Platform 2.2.0

Troubleshooting

Based on the HBase book (http://hbase.apache.org/book.html#performance), here is what I already checked :

1) Hardware

IO(disk)
- NMon says disk are never busy more than 80%, and most frequently between 0 and 20%
- Top says HBase JVM's are not swapping (checked 2 of 5 RS)
IO(network) : each node active interface stand on the same switch (all second passive interface are plugged on a different switch)

2) JVM

GC pauses OK (few milliseconds pause every minute or so)
Heap looks OK (not peaking too long near the limit)
CPU is suprisingly LOW : never more than 10%
Threads :
- Active threads (10 "RpServe.reader=N" + a few other) show no contention
- Lot of parked thread doing nothing (60 "DefaultRpcServer.handler=n", approx 15 other)
- Huge list of IPC Client without any thread status

3) Data

was bulk loaded using Hive + completebulkload.
Number of region :
- 13 regions meaning we have 2 to 3 large regions for each RS, which is what is expected.
- Scan performance remains unchanged after forcing a major compaction.
- Region size is rather homogeneous : 4,5Go (+/-0.5) for 11 regions, 2,5Go for 2 regions

4) HBase configuration

Most configuration remained unchanged.
- HBase env only indicates ports for JMX console
- HBase-site has few settings for Phoenix
Some of the params that looked OK to me
- hbase.hregion.memstore.block.multiplier
- hbase.hregion.memstore.flush.size : 134217728 bytes (134Go)
- Xmn ratio of Xmx : .2 Xmn max value : 512 Mb Xms : 6144m
- hbase.regionserver.global.memstore.lowerLimit : 0.38
- hbase.hstore.compactionTreshold : 3
- hfile.block.cache.size : 0.4 (Block cache size AS % of heap)
- Maximum HStoreFile (hbase.hregion.max.filesize) : 10 go (10737418240)
- Client scanner cache : 100 rows zookeeper timeout : 30s
- Client max keyvalue size : 10mo
- hbase.regionserver.global.memstore.lowerLimit : 0.38
- hbase.regionserver.global.memstore.upperLimit : 0.40
- hstore blocking storefiles : 10
- hbase.hregion.memstore.mslab.enabled :
- enabled hbase.hregion.majorcompaction.jitter : 0.5
Tried following configuration changes without any impact on performance
- hbase-env.sh : tried to increase HBASE_HEAPSIZE=6144 (since it default at 1000)
- hbase-site.xml :
  - hbase.ipc.server.callqueue.read.ratio : 0.9
  - hbase.ipc.server.callqueue.scan.ratio : 0.9

5) Log say nothing usefull

cat hbase-hbase-master-cox.log | grep "2015-05-11.*ERROR"

cat hbase-hbase-regionserver-*.log | grep "2015-05-11.*ERROR"

print nothing

Printing WARNs shows non related errors

2015-05-11 17:11:10,544 WARN [B.DefaultRpcServer.handler=8,queue=2,port=60020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x2aca5fca): could not load 1074749724_BP-2077371184-184.10.17.65-1423758745093 due to InvalidToken exception.

2015-05-11 17:09:12,848 WARN [regionserver60020-smallCompactions-1430754386533] hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period" is deprecated. Instead, use "hbase.client.scanner.timeout.period"

Martin Martin · Accepted Answer · 2015-06-19T14:09:49

Got it : the key is to separate "hot" content from "cold" content into separate column families. Column families are used to store columns in separate HFiles, so we can use one column family for indexed (or frequently read) columns, and one other column family (thus file) for all other columns.

First step : see that smaller column family is faster to scan

We simply discard cold content to build a single smaller column family (1655 columns -> 7 columns).

Performances on medium size table scans :

[37.876.602 rows, 1655 columns] scan 1000 rows took 39.4750
[76.611.463 rows, 7 columns] scan 1000 rows took 1.8620

Remarks :

total number of rows can be ignored as we scan the first 1000 rows
there is an overhead with large rows as scanning from Hbase shell prints content in console

Second step : generate multi-family HTable

We do a bulk load by generating HFiles from Hive. Although the doc says we can't generate one multi family table, one can generate HFiles separately :

create table mytable_f1 (UUID string, source_col1, source_col2)
...
TBLPROPERTIES('hfile.family.path' = 'tmp/mytable/**f1**');

create table mytable_f1 (UUID string, source_col3, source_col4)
...
TBLPROPERTIES('hfile.family.path' = 'tmp/mytable/f2');

And then simply call import command as usual :

hadoop jar [hbase-server-jar] completebulkload /tmp/mytable mytable

HBase scans are slow

2 Answers