If you have a column family, all the columns for a rowkey are in the same HFile? Could data from a rowkey and same column family mixed in different HFiles?. It's because I thought they are sorted, but I read in a book:
Data from a single column family for a single row need not be stored in the same HFile. That's why the row could be too width and it doesn't fit in a single HFile?
The only requirement is that within an HFile, data for a row’s column family is stored together. It seems a little contradictory to me.
Note: I have been reading a little about the topic. HBase uses LSM tree. I have a rowkey and all data in one HFile. Later, I could add some new data, they will store in memory, when memory is full, HBase'll store these data in a new HFile. So that, I could have qualifiers for one rowkey in two HFiles. If I want to do a get or scan operation about that rowkey, I'll have to seek in two files. With the time, HBase will execute a major compaction, it'll create an only HFile joining the old two HFiles and delete them after the compaction. So, If I want to look up that rowkey, I will only need one search. Am I right?? I didn't understand why there're minor and major compaction, because they seem to do the same.