Would like to print missing sequence gap from First Column ( Start Missing Sequence , End Missing Sequence) Then need to print Minimum & Maximum sequence of that First Column And the combinations of $2,substr($3,4,6),substr($4,4,6),$6,$8,$10 fields. Input file is not sorted as per first column.
Input.csv
21,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
22,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
23,abc,22-JUN-12.08:06:03,22-JUN-12.08:06:03,19-Apr-16,1,INR,RO0412,RC03,L7,,31
24,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
28,abc,30-JUN-12.01:06:49,30-JUN-12.01:06:49,19-Apr-16,1,INR,RO0412,RC03,L7,,29
32,abc,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
38,abc,29-MAY-13.12:05:11,29-MAY-13.12:05:11,15-Feb-17,1350,INR,RO0213,CD,K1,,30
41,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
46,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
51,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
52,abc,20-FEB-14.11:02:37,20-FEB-14.11:02:37,31-Dec-20,650,INR,EN1113,ch650,S317,,28
Have tried this command and got the partial output:
cat Input.csv | \
awk -F, '{OFS=","; print $1,$2,substr($3,4,6),substr($4,4,6),$6,$8,$10}' | \
sort -k1 -t, | \
awk -F, 'BEGIN {OFS=","} (($1!=p+1) && ($7==p7)) {print p,p2,p3,p4,p5,p6,p7,p+1 "," $1-1,$1} {p=$1;p2=$2;p3=$3;p4=$4;p5=$5;p6=$6;p7=$7}'
Above command output header name is:
Minimum Seq ($1),$2,substr($3,4,6),substr($4,4,6),$6,$8,$10,start Missing Seq ($1),End Missing Seq ($1),Maximum Seq ($1)
24,abc,JUN-12,JUN-12,1,RO0412,L7,25,27,28
32,abc,MAY-13,MAY-13,1350,RO0213,K1,33,37,38
41,abc,FEB-14,FEB-14,650,EN1113,S317,42,45,46
46,abc,FEB-14,FEB-14,650,EN1113,S317,47,50,51
In the above output - Minimum Seq ($1),Maximum Seq ($1) value is not correct the way I expected the result , Please help ... For instance , First line in printed output - Minimum seq should be 21 not 24 Third line in printed output - Maximum seq should be 52 not 46
Desired Output:
## $2,$3,$4,$6,$8,$10,"start Missing Seq ($1), ",End Missing Seq ($1) ,Minimum Seq ($1),Maximum Seq ($1) ##
abc,JUN-12,JUN-12,1,ROTN0412,L7,25,27,21,28
abc,MAY-13,MAY-13,1350,ROTN0213,K1,33,37,32,38
abc,FEB-14,FEB-14,650,CHEN1113,S317,42,45,41,52
abc,FEB-14,FEB-14,650,CHEN1113,S317,47,50,41,52
edit
button and format the question a little bit. Like this it is impossible to read. - fedorqui 'SO stop harming'