enter image description hereI have a hive table with more than millions records. The input is of the following type: Input:
rowid |starttime |endtime |line |status
--- 1 2007-07-19 00:05:00 2007-07-19 00:23:00 l1 s1
--- 2 2007-07-20 00:00:10 2007-07-20 00:22:00 l1 s2
--- 3 2007-07-19 00:00:00 2007-07-19 00:11:00 l2 s2
What I want to do is when 1st order the table by starttime group by line. Then find the difference between two consecutive rows endtime and starttime. If the difference is more than 5mins then in a new table add a new row in between with status misstime.
In input row 1 & 2 the time time difference is 1 hour 10 mins so 1st I will create row for 19th Date and complete that days with missing time and then add one more row for 20th as below.
output:
rowid |starttime |endtime |line |status
--- 1 |2007-07-19 00:05:00 |2007-07-19 00:23:00 |l1 |s1
--- 2 |2007-07-19 00:23:01 |2007-07-19 00:00:00 |l1 |misstime
--- 3 |2007-07-20 00:00:01 |2007-07-20 00:00:09 |l1 |misstime
--- 4 |2007-07-20 00:00:10 |2007-07-20 00:22:00 |l1 |s2
--- 3 |2007-07-19 00:00:00 |2007-07-19 00:11:00 |l2 |s2
Can anyone help me achieve this directly in hue - hive ? Unix script will also do.
Thanks in advance.