Problem
I am trying to build a secondary index with Phoenix. Index creation takes several hours. It seems to be due to slow HBase scans, as I noticed the following performance :
- I might need 2 hours to scan the table, whereas other developers reported a few minutes for larger tables (100 millions rows).
- HBase shell is able to count rows at an approx. rate of 10.000 per second, which means 3800s (>1 hour!) to count all rows of this table.
Both with HBase shell and a Java scanner.
NB : The GET(by rowkey) operation is achieved with good performances (approx 0.5s).
Context
- 38 millions rows / 1000 columns / single column family / 96Go with GZ compression.
- Cluster has 6 nodes (126Go RAM, 24 cores) with 5 region servers.
- Hortonworks Data Platform 2.2.0
Troubleshooting
Based on the HBase book (http://hbase.apache.org/book.html#performance), here is what I already checked :
1) Hardware
- IO(disk)
- NMon says disk are never busy more than 80%, and most frequently between 0 and 20%
- Top says HBase JVM's are not swapping (checked 2 of 5 RS)
- IO(network) : each node active interface stand on the same switch (all second passive interface are plugged on a different switch)
2) JVM
- GC pauses OK (few milliseconds pause every minute or so)
- Heap looks OK (not peaking too long near the limit)
- CPU is suprisingly LOW : never more than 10%
- Threads :
- Active threads (10 "RpServe.reader=N" + a few other) show no contention
- Lot of parked thread doing nothing (60 "DefaultRpcServer.handler=n", approx 15 other)
- Huge list of IPC Client without any thread status
3) Data
- was bulk loaded using Hive + completebulkload.
- Number of region :
- 13 regions meaning we have 2 to 3 large regions for each RS, which is what is expected.
- Scan performance remains unchanged after forcing a major compaction.
- Region size is rather homogeneous : 4,5Go (+/-0.5) for 11 regions, 2,5Go for 2 regions
4) HBase configuration
Most configuration remained unchanged.
- HBase env only indicates ports for JMX console
- HBase-site has few settings for Phoenix
Some of the params that looked OK to me
- hbase.hregion.memstore.block.multiplier
- hbase.hregion.memstore.flush.size : 134217728 bytes (134Go)
- Xmn ratio of Xmx : .2 Xmn max value : 512 Mb Xms : 6144m
- hbase.regionserver.global.memstore.lowerLimit : 0.38
- hbase.hstore.compactionTreshold : 3
- hfile.block.cache.size : 0.4 (Block cache size AS % of heap)
- Maximum HStoreFile (hbase.hregion.max.filesize) : 10 go (10737418240)
- Client scanner cache : 100 rows zookeeper timeout : 30s
- Client max keyvalue size : 10mo
- hbase.regionserver.global.memstore.lowerLimit : 0.38
- hbase.regionserver.global.memstore.upperLimit : 0.40
- hstore blocking storefiles : 10
- hbase.hregion.memstore.mslab.enabled :
- enabled hbase.hregion.majorcompaction.jitter : 0.5
Tried following configuration changes without any impact on performance
- hbase-env.sh : tried to increase HBASE_HEAPSIZE=6144 (since it default at 1000)
- hbase-site.xml :
- hbase.ipc.server.callqueue.read.ratio : 0.9
- hbase.ipc.server.callqueue.scan.ratio : 0.9
5) Log say nothing usefull
cat hbase-hbase-master-cox.log | grep "2015-05-11.*ERROR"
cat hbase-hbase-regionserver-*.log | grep "2015-05-11.*ERROR"
print nothing
Printing WARNs shows non related errors
2015-05-11 17:11:10,544 WARN [B.DefaultRpcServer.handler=8,queue=2,port=60020] shortcircuit.ShortCircuitCache: ShortCircuitCache(0x2aca5fca): could not load 1074749724_BP-2077371184-184.10.17.65-1423758745093 due to InvalidToken exception.
2015-05-11 17:09:12,848 WARN [regionserver60020-smallCompactions-1430754386533] hbase.HBaseConfiguration: Config option "hbase.regionserver.lease.period" is deprecated. Instead, use "hbase.client.scanner.timeout.period"