1
votes

I need to calculate average from columns from multiple files but columns have different number of lines. I guess awk is best tool for this but anything from bash will be OK. Solution for 1 column per file is OK. If solution works for files with multiple columns, even better.

Example.

file_1:

10
20
30
40
50

file_2:

20
30
40

Expected result:

15
25
35
40
50
2
Create two arrays whose keys are the line number in the file (the FNR variable). One contains the sum of all the fields on that line number, the other contains the count of files that have that line number. At the end, loop through the array and print the total divided by the count to get the average. - Barmar
StackOverflow expects you to try to solve your own problem first, and we also don't answer homework questions. Please update your question to show what you have already tried in a minimal, complete, and verifiable example. For further information, please see how to ask good questions, and take the tour of the site :) - Barmar
by the way your final average should be 15 25 35 20 25 - Allan
@Allan that would be easy to solve but this is not what I need, values that does not have corresponding data in other files must be only copied, not averaged with zeroes that does not exist. Barmar, thank you but it would not make sense to post everything I tried, it would be very long and messy list of awk, paste and other lines that would not contribute to problem much. - Ivan Toman

2 Answers

0
votes

awk would be a tool to do it easily,

awk '{a[FNR]+=$0;n[FNR]++;next}END{for(i=1;i<=length(a);i++)print a[i]/n[i]}' file1 file2

And the method could also suit for multiple files.

Brief explanation,

  • FNR would be the input record number in the current input file.
  • Record the sum of the specific column in files into a[FNR]
  • Record the number of show up times for the specific column into n[FNR]
  • Print the average value for each column using print a[i]/n[i] in the for loop
0
votes

I have prepared for you the following bash script for you, I hope this helps you.

Let me know if you have any question.

#!/usr/bin/env bash

#check if the files provided as parameters exist
if [ ! -f $1 ] || [ ! -f $2 ]; then
    echo "ERROR: file> $1 or file> $2 is missing"  
    exit 1;
fi
#save the length of both files in variables
file1_length=$(wc -l $1 | awk '{print $1}')
file2_length=$(wc -l $2 | awk '{print $1}')

#if file 1 is longer than file 2 appends n 0\t to the end of the file
#until both files are the same length
# you can improve the scrips by creating temp files instead of working directly on the input ones
if [ "$file1_length" -gt "$file2_length" ]; then
    n_zero_to_append=$(( file1_length - file2_length ))
    echo "append $n_zero_to_append zeros to file $2"
    #append n zeros to the end of file
    yes 0 | head -n "${n_zero_to_append}" >> $2
    #combine both files and compute the average line by line
    awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if file 2 is longer than file 1 do the inverse operation
# you can improve the scrips by creating temp files instead of working on the input ones
elif [ "$file2_length" -gt "$file1_length" ]; then
    n_zero_to_append=$(( file2_length - file1_length ))
    echo "append $n_zero_to_append zeros to file $1"
    yes 0 | head -n "${n_zero_to_append}" >> $1
    awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
#if files have the same size we do not need to append anything
#and we can directly compute the average line by line
else 
    echo "the files : $1 and $2 have the same size."
    awk 'FNR==NR { a[FNR""] = $0; next } { print (a[FNR""]+$0)/2 }' $1 $2
fi