I am loading data into HBase via Pig. The pig script runs daily to look for updated records in various hive tables, performs joins and processing, then loads to HBase. The problem I'm having is that sometimes one part of the record is updated, but not other parts.
Example: Record with key abcd123 exists in hive table 1 and hive table 2. In hive table 1, there is new data, but not in hive table 2. My pig script joins both tables and then loads the joined record to hbase, updating the existing record in hbase for key abcd123.
Is there a way that I can have HBase check to see if the data currently in hbase for the key is different from what the pig script is attempting to load, and then only accept the write of the different values? No point in updating the row with bunches of data that hasn't changed just to get the one value which has changed.