1
votes

I have a cluster Hbase set-up. In that I have 3 Region Servers. There is a table which has 27 Regions equally distributed among 3 Region servers--9 regions per region server.

Region server 1 has ---region 1-9 Region server 2 has ---region 10-18 Region server 3 has ---region 19-27

Now when I start a program which inserts rows in region 1 and region 5 (both under Region Server-1) alternatively and on continuous basis, I see that the insert time for each row is not constant or consistent---there is a lot of variance or say standard deviation of insert time is quite large. Some times it takes 2 ms to insert a row, sometimes 3 ms,sometimes 1000 ms and sometimes even > 3000 ms.Even though data size in rows is equal.

I understand that due to flushing and compaction of Regions the writes are blocked---but then it should not be blocked for larger span of time and the blockage time should be consistent for every flush/compaction (minor compaction).

All in all every time flush and compaction occurs it should take nearly same time for each compaction and flush.

For our application we need a consistent quality of service and if not perfect atleast we need a well visible boundary lines--like for each row insert it will take some 0 to 10 ms and not more than 10 ms(just an example) that even though minor compaction or flush occurs.

Is there any setting/configuration which I should try?

Any ideas of how to achieve it in Hbase.

Any help would be really appreciated.

Thanks in advance!!

1
Regarding "writes are blocked" - You can "grep" the hbase regionserver log file and search for "Blocking updates". If you have such lines it means your updates are blocked. There are several reasons this might occur. For me it was the HDFS - it was too slow. We wound finding that HDFS was slow since some nodes were hooked to a 100Mbit switch, while the rest were on 1000Mbit switch.Asaf Mesika

1 Answers

1
votes

First compaction will not block your writes! The main thing that i would recommend you is to check GC on region server/client. BTW did you check that you don't have split occurring?

Some other input that can help to answer

  1. what is the size of the data, how many columns and how many column families?
  2. what is the throughput of your insert?
  3. How many memory did you allocated to your HBase region servers?
  4. HDFS data nodes are on the same server as the region servers?
  5. How many disk do you have per machine?