I have been trying to understand how Hbase works. Specifically - how data is stored to disk.
I have read articles online and two of which helped me were -
http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw
and
I still have some questions, may be because I didn't understand HBase very well. Here's what I get from what I read - Every transaction - (Put/Get/Delete) is saved as KeyValue in the memstore and then written off to StoreFiles/Hfiles on flushing. The data that is stored on the disk is actually these HFiles.
Now, the structure of the KeyValue class specifies - the data that needs to be stored(if any), key and the type of operation (Put/Get/Delete). The data blocks in the HFiles themselves represent the KeyValues (with the "rowkey" being a part of the Key).
As I see it, when these KeyValues are persisted, its more like saving a transaction rather than making changes to existing data. When does a transaction of this kind be processed/consolidated to result in a row. I assumed that it could be during the compaction process but then, I don't know how the requests to the data which is written to a HFile but not compacted are handled.
I also did not understand it when the articles said "Before a KeyValue pair is written to a block, the order of the key must be bigger than the previous one."
I think I have made some wrong assumptions in the process of understanding HBase.
Can someone help me understand this.