Calculating average of all columns, with extra info for each row in the output bash/awk

Question

The first file I have (file1.txt):

ID name start end
ID3 pot 15 28
ID2 cat 12 25
ID4 dog 20 30
ID1 turtle 1 10

And another file (file2.txt):

key ID1 ID2 ID3 ID4
23 1.5 2.5 1.2 3.4
5 1.4 2.3 4.2 1.4
21 1.2 2.4 3.5 1.9

The output file should be:

name start end field value
turtle 1 10 . 1.37
cat 12 25 . 2.4
pot 15 28 . 6.57
dog 20 30 . 2.23

The columns "name", "start", and "end" are from the first file.

The column "ID" is used to link file1 to file2

The average for each ID in file2 is calculated, and it is then assigned to the correct row in the output file. For example, turtle has ID1, the average for ID1 is (1.5+1.4+1.2)/3 = 1.37..

And notice how the output file is sorted on the column "start"

My awk guess is that the ID will be the key in an associative array

awk '{FNR==NR{a[$1]=$2,$3,$4;next}' file1.txt

And then for file2, calculate the column averages similar to what this link describes: http://www.fordodone.com/2013/08/02/awk-average-multiple-columns/

cat sample.txt | awk '{for (i=1;i<=NF;i++){a[i]+=$i;}} END {for (i=1;i<=NF;i++){printf "%.0f", a[i]/NR; printf "\t"};printf "\n"}'

But I'm not quite sure how to combine them together to get the desired output.

Any solutions welcome, including bash ones..

Akshay Hegde Akshay Hegde · Accepted Answer · 2015-05-19T06:11:03

Try, if order doesn't matter

awk 'FNR==NR{for(i=2;i<=NF;i++){ if(NR==1){ t[i]=$i } else{ A[t[i]]+=$i; c = NR-1 }} next }FNR==1{print "name","start","end","field","value";next}{print $2,$3,$4,".",A[$1]/c}' OFMT='%5.2f' OFS='\t'  file2 file1

Better readable version:

 awk 'FNR==NR{
                for(i=2;i<=NF;i++)
                {
                 if(NR==1)
                 { 
                   t[i]=$i 
                 } 
            else{ 
                   A[t[i]]+=$i
                   c = NR-1 
                }
               } 
                 next 
            }
      FNR==1{
              print "name","start","end","field","value"
              next
            }
            {
              print $2,$3,$4,".",A[$1]/c
            }
       '  OFMT='%5.2f' OFS='\t'  file2 file1

Input

 [akshay@localhost tmp]$ cat file1
 ID name start end
 ID3 pot 15 28
 ID2 cat 12 25
 ID4 dog 20 30
 ID1 turtle 1 10

 [akshay@localhost tmp]$ cat file2
 key ID1 ID2 ID3 ID4
 23 1.5 2.5 1.2 3.4
 5 1.4 2.3 4.2 1.4
 21 1.2 2.4 3.5 1.9

Output

 [akshay@localhost tmp]$ awk 'FNR==NR{for(i=2;i<=NF;i++){ if(NR==1){ t[i]=$i } else{ A[t[i]]+=$i; c = NR-1 }} next }FNR==1{print "name","start","end","field","value";next}{print $2,$3,$4,".",A[$1]/c}' OFMT='%5.2f' OFS='\t'  file2 file1
 name   start   end field   value
 pot    15      28   .      2.97
 cat    12      25   .      2.40
 dog    20      30   .      2.23
 turtle 1       10   .      1.37

Calculating average of all columns, with extra info for each row in the output bash/awk

1 Answers