I have a table in Hbase with one column family called aand around 30 columns in it. Below is a sample that shows cell values of two row keys-
ROW COLUMN+CELL
00:001000574 column=a:aasbig, timestamp=1486493154559, value=true
00:001000574 column=a:aasdel, timestamp=1486493154559, value=true
00:001000574 column=a:aasdhq, timestamp=1486493154559, value=false
00:001000574 column=a:aasfsc, timestamp=1486493154559, value=true
00:001000574 column=a:aasgbm, timestamp=1486493154559, value=true
00:001000574 column=a:aasgbr, timestamp=1486493154559, value=true
00:001000574 column=a:aasmcu, timestamp=1486493154559, value=true
00:001000574 column=a:aasser, timestamp=1486493154559, value=true
00:001000574 column=a:aastlp, timestamp=1486493154559, value=true
00:001000574 column=a:aasvia, timestamp=1486493154559, value=true
00:001000707 column=a:aasbig, timestamp=1486493154559, value=false
00:001000707 column=a:aasdel, timestamp=1486493154559, value=false
00:001000707 column=a:aasdhq, timestamp=1486493154559, value=true
00:001000707 column=a:aasfsc, timestamp=1486493154559, value=false
00:001000707 column=a:aasgbm, timestamp=1486493154559, value=false
00:001000707 column=a:aasgbr, timestamp=1486493154559, value=false
00:001000707 column=a:aasmcu, timestamp=1486493154559, value=false
00:001000707 column=a:aasser, timestamp=1486493154559, value=false
00:001000707 column=a:aastlp, timestamp=1486493154559, value=false
00:001000707 column=a:aasvia, timestamp=1486493154559, value=false
Each column has a value with either true or false. These values are subjected to change and week later the values may be different. I would like to capture the old and new values. The result should be stored in a CSV file.
My requirement is, when I run the code for the first time I should see the OLDVALUE as NULL and all the values from the HBase table should be a part of NEWVALUE.
Below is the output I want to see in a CSV file when run for the first time.
NUM,PRODUCT,OLDVALUE,NEWVALUE
001000574,aasbig,NULL,true
001000574,aasdel,NULL,true
001000574,aasdhq,NULL,false
001000574,aasfsc,NULL,true
001000574,aasgbm,NULL,true
001000574,aasgbr,NULL,true
001000574,aasmcu,NULL,true
001000574,aasser,NULL,true
001000574,aastlp,NULL,true
001000574,aasvia,NULL,true
001000707,aasbig,NULL,false
001000707,aasdel,NULL,false
001000707,aasdhq,NULL,true
001000707,aasfsc,NULL,false
001000707,aasgbm,NULL,false
001000707,aasgbr,NULL,false
001000707,aasmcu,NULL,false
001000707,aasser,NULL,false
001000707,aastlp,NULL,false
001000707,aasvia,NULL,false
From Second time on wards when I run the code all the values in NEWVALUES from the previous run should now be under OLDVALUES and the NEWVALUES should get the current values from the HBase table. Like the below sample output
NUM,PRODUCT,OLDVALUE,NEWVALUE
001000574,aasbig,true,true
001000574,aasdel,true,true
001000574,aasdhq,false,false
001000574,aasfsc,true,true
001000574,aasgbm,true,false
001000574,aasgbr,true,true
001000574,aasmcu,true,false
001000574,aasser,true,false
001000574,aastlp,true,true
001000574,aasvia,true,true
001000707,aasbig,false,true
001000707,aasdel,false,true
001000707,aasdhq,true,true
001000707,aasfsc,false,false
001000707,aasgbm,false,false
001000707,aasgbr,false,false
001000707,aasmcu,false,true
001000707,aasser,false,true
001000707,aastlp,false,false
001000707,aasvia,false,true
What I tried:
I created a Hive-on-Hbase table and while querying the table I was only able to get the NUM and the value. I was unable to get the HBase column name. Also I had challenges in getting the Old and New value unless I implement some join operations.
Can we write Pig script to achieve this?
Any help is much appreciated.