The first file I have (file1.txt):
ID name start end
ID3 pot 15 28
ID2 cat 12 25
ID4 dog 20 30
ID1 turtle 1 10
And another file (file2.txt):
key ID1 ID2 ID3 ID4
23 1.5 2.5 1.2 3.4
5 1.4 2.3 4.2 1.4
21 1.2 2.4 3.5 1.9
The output file should be:
name start end field value
turtle 1 10 . 1.37
cat 12 25 . 2.4
pot 15 28 . 6.57
dog 20 30 . 2.23
The columns "name", "start", and "end" are from the first file.
The column "ID" is used to link file1 to file2
The average for each ID in file2 is calculated, and it is then assigned to the correct row in the output file. For example, turtle has ID1, the average for ID1 is (1.5+1.4+1.2)/3 = 1.37..
And notice how the output file is sorted on the column "start"
My awk guess is that the ID will be the key in an associative array
awk '{FNR==NR{a[$1]=$2,$3,$4;next}' file1.txt
And then for file2, calculate the column averages similar to what this link describes: http://www.fordodone.com/2013/08/02/awk-average-multiple-columns/
cat sample.txt | awk '{for (i=1;i<=NF;i++){a[i]+=$i;}} END {for (i=1;i<=NF;i++){printf "%.0f", a[i]/NR; printf "\t"};printf "\n"}'
But I'm not quite sure how to combine them together to get the desired output.
Any solutions welcome, including bash ones..