0
votes

I have a file with the following values:

 ID1 RID1 2 rid1 part2
 ID1 RID2 1 rid2 part1
 ID1 RID2 2 rid2 part2
 ID2 RID3 1 rid3 part1
 ID2 RID3 2 rid3 part2
 ID2 RID4 1 rid4 part1

ID RID Offset Text. ID, RID, Offset and Text are tab delimited. The text can be multiple words with spaces in between.

I am trying to concatenate them based on RID and ascending offset.

Essentially the desired output is

ID2     RID3    rid3 part1rid3 part2
ID2     RID4    rid4 part1
ID1     RID1    rid1 part1rid1 part2
ID1     RID2    rid2 part1rid2 part2

I am trying to do this with awk. Here is my awk 1 liner:

cat example.txt| awk '{line=""; line = line $4; table[$1"\t"$2]=table[$1"\t"$2] line;} END {for (key in table) print key"\t"table[key];}'

For some reason, awk is not able to parse all the words in $4, i.e, it is just picking the 1st word and outputting:

ID2     RID3    rid3rid3
ID2     RID4    rid4
ID1     RID1    rid1rid1
ID1     RID2    rid2rid2

How do I parse all the words in $4 and not just the 1st word?

2
Have you tried awk -F "\t" to set the field separator to tab rather than any combination of spaces and tabs as is the default?Jules

2 Answers

1
votes

I suggest something like :

awk -F " " '{key=$1" "$2; value=$4" "$5; if(! key in t){t[key]=value} else {t[key]=t[key]""value}} END {for (key in t){print key" "t[key]}}' file|sort -rt' ' -k1

Regards, Idriss

0
votes

Start with this updated version of your own script:

awk 'BEGIN{FS=OFS=SUBSEP="\t"} {table[$1,$2]=table[$1,$2] $4} END{for (key in table) print key, table[key]}' example.txt

Let us know if that doesn't do what you want and you need help figuring out how to fix it.