0
votes

I am trying to compare two csv files based on their first and output first two columns of file 1 and second and third columns of file 2 to a new csv file

Example:

file1.csv

asdf123,1  
adfg234,2  
asdf567,3

file2.csv

asdf123,q,w  
asdf567,r,t  

desired output

asdf123,1,q,w  
adfg234,2,NA  
asdf567,3,r,t 

I used the following code

awk -F, 'FNR==NR{a[$1]=$0;next};{if (i in a) {print a[$1] "," $2} else {print a[$1] "," "NA"}}' file1.csv file2.csv > output.csv

However, the command after else doesn't seem to work and has empty rows in the place and in addition, the out put is printed as follows (with new lines in between):

asdf123,1  
*blank space* q,w  

asdf567,3  
*blank space* r,t

I am new to scripting. Can someone please guide me to fix this.

2

2 Answers

1
votes

There were a couple of bugs in your script:

$ awk -F, '
FNR==NR {
    a[$1]=$0
    next
}
{
    if($1 in a)              # no i anywhere, use $1
        print a[$1] "," $2 
    else 
        print $0 "," "NA"    # cant use a[$1] in this else 
}' file2 file1               # file order

Output now:

asdf123,q,w,1
adfg234,2,NA
asdf567,r,t,3

And it won't match your output, since the file order changed, without some more effort:

$ awk '
BEGIN {
    FS=OFS=","
}
FNR==NR {
    key=$1                  # $0="1,2,3" key=1
    $1=""                   # $0=",2,3"
    a[key]=$0               # a[1]=",2,3"
    next
}
{
    if ($1 in a) 
        print $1,$2 a[$1]
    else 
        print $0,"NA"
}' file2 file1

Output this time:

asdf123,1,q,w
adfg234,2,NA
asdf567,3,r,t 
0
votes

With bash, sort and GNU join:

join -t ',' -a 1 -e 'NA' <(sort file1.csv) <(sort file2.csv) -o 1.1,1.2,2.2,2.3

Output:

adfg234,2,NA,NA
asdf123,1,q,w
asdf567,3,r,t

See: man join