I want to understand how Hbase internally handles duplicates records from a file. In order to experiment this, I have created an EXTERNAL table in hive with HBase specific configuration properties like table properties, SERDE, column family. I have to create the table in HBase with column family as well, which I did.
I have performed an insert overwrite into this HIVE table from a source table which has duplicate records. By duplicate records I mean like this,
ID | Name | Surname
1 | Ritesh | Rai
1 | RiteshKumar | Rai
Now after performing insert overwrite, I queried my HIVE table with id 1, I got the output as (the second one)
1 RiteshKumar Rai
I wanted to under how HBase decides which one is updated? Is it just that it just writes the data in a sequential manner. The last record will be overwritten in and considered as latest? Or how it is?
Thanks in advance.
Regards, Govind