AWK, average columns of different length from multiple files

Question

I need to calculate average from columns from multiple files but columns have different number of lines. I guess awk is best tool for this but anything from bash will be OK. Solution for 1 column per file is OK. If solution works for files with multiple columns, even better.

Example.

file_1:

file_2:

20
30
40

Expected result:

Create two arrays whose keys are the line number in the file (the FNR variable). One contains the sum of all the fields on that line number, the other contains the count of files that have that line number. At the end, loop through the array and print the total divided by the count to get the average. — Barmar
StackOverflow expects you to try to solve your own problem first, and we also don't answer homework questions. Please update your question to show what you have already tried in a minimal, complete, and verifiable example. For further information, please see how to ask good questions, and take the tour of the site :) — Barmar
@Allan that would be easy to solve but this is not what I need, values that does not have corresponding data in other files must be only copied, not averaged with zeroes that does not exist. Barmar, thank you but it would not make sense to post everything I tried, it would be very long and messy list of awk, paste and other lines that would not contribute to problem much. — Ivan Toman

CWLiu CWLiu · Accepted Answer · 2017-11-10T02:22:06

awk would be a tool to do it easily,

awk '{a[FNR]+=$0;n[FNR]++;next}END{for(i=1;i<=length(a);i++)print a[i]/n[i]}' file1 file2

And the method could also suit for multiple files.

Brief explanation,

FNR would be the input record number in the current input file.
Record the sum of the specific column in files into a[FNR]
Record the number of show up times for the specific column into n[FNR]
Print the average value for each column using print a[i]/n[i] in the for loop

AWK, average columns of different length from multiple files

2 Answers