0
votes

I have recently read that after Hbase major compaction ,if the size of store file becomes greater than hbase.hregion.max.filesize ie 256MB then it is again spilt into 2. So can anyone explain compaction is done on files of what size. The storefile formed after major compaction will have data of how many column families?

1

1 Answers

2
votes

hbase.hregion.max.filesize refers to, as the name suggests - region size. Regions are essentially partitions of your hbase data (stored as hfiles). HBase will store your data into regions, and if a region gets too big (too big being defined by hbase.hregion.max.filesize), will split the region into two regions.

Generally, a region size of 256MB is quite small, and most use cases will require a bigger one . Determining the exact size can be somewhat of a dark art, but here is reference: http://hbase.apache.org/book/ops.capacity.html#ops.capacity.regions.

You can set the region size when you create a table in the htable descriptor

Each region should have all the column families defined by your table. For further performance tuning, you can specify a block-size per column family and that can have impacts on performance with scans, gets, and writes.

You can also check out this resource for configuration tips: http://hbase.apache.org/book/important_configurations.html