Awk: Sum up column values across multiple files with identical column layout

Question

I have a number of files with the same header:

COL1, COL2, COL3, COL4

You can ignore COL1-COL3. COL4 contains a number. Each file contains about 200 rows. I am trying to sum up across the rows. For example:

File 1

COL1 COL2 COL3 COL4
 x    y   z    3
 a    b   c    4

File 2

COL1 COL2 COL3 COL4
 x     y    z   5 
 a     b    c   10

Then a new file is returned:

COL1 COL2 COL3 COL4
 x     y    z   8 
 a     b    c   14

Is there a simple way to do this without AWK? I will use AWK if need be, I just thought there might be a simple one-liner that I could just run right away. The AWK script I have in mind feels a bit long.

Thanks

Are the COL1-3 same in all the files? Do they appear in the same order in all the files? — choroba
awk would be a good and optimal choice. Of course, not with one short line, but ... with 2 lines, yes, I would write it with 2 lines — RomanPerekhrest
You're thinking backwards - in any given text manipulation situation where there might be a length or complicated solution, awk IS the simpler way. — Ed Morton

Kristo Mägi Kristo Mägi · Accepted Answer · 2017-06-16T21:17:13

One more option.

The command:

paste f{1,2}.txt | sed '1d' | awk '{print $1,$2,$3,$4+$8}' | awk 'BEGIN{print "COL1","COL2","COL3","COL4"}1'

The result:

COL1 COL2 COL3 COL4
x y z 8
a b c 14

What it does:

Test files:

$ cat f1.txt
COL1 COL2 COL3 COL4
 x    y   z    3
 a    b   c    4

$ cat f2.txt
COL1 COL2 COL3 COL4
 x     y    z   5
 a     b    c   10

Command: paste f{1,2}.txt
Joins 2 files and gives output:

COL1 COL2 COL3 COL4 COL1 COL2 COL3 COL4
 x    y   z    3     x     y    z   5
 a    b   c    4     a     b    c   10

Command: sed '1d'
Is meant to remove header temporarily

Command: awk '{print $1,$2,$3,$4+$8}'
Returns COL1-3 and sums $4 and $8 from paste result.

Command: awk 'BEGIN{print "COL1","COL2","COL3","COL4"}1'
Adds header back

EDIT:
Following @mklement0 comment, he is right about header handling as I forgot the NR==1 part.

So, I'll proxy his updated version here also:

paste f{1,2}.txt | awk '{ print $1, $2, $3, (NR==1 ? $4 : $4 + $8) }'

Awk: Sum up column values across multiple files with identical column layout

5 Answers