awk | update field number after comparing field from other file

Question

Input File1: file1.txt
MH=919767,918975
DL=919922
HR=919891,919394,919812
KR=919999,918888
Input File2: file2.txt
aec,919922783456,a5,b3,,,asf
abc,918975583456,a1,b1,,,abf
aeci,919998546783,a2,b4,,,wsf
Output File
aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf
Notes
- Need to compare phone number (Input file2.txt - 2nd field - initial 6 digit only) within Input file1.txt - 2nd field with "=" separted). If there is match in intial 6 digit of phone number, then OUTPUT should contain 2 digit code from file (Input file1) into output in 5th field
- File1.txt is having single code (for example MH) for mupltiple phone number intials.

How much data do you have? Can one of the files fit in memory? Any specific reason you want to use awk only and not perl/python? — Hari Menon

Steve Steve · Accepted Answer · 2013-02-24T09:02:54

If you have GNU awk, try the following. Run like:

awk -f script.awk file1.txt file2.txt

Contents of script.awk:

BEGIN {
     FS="[=,]"
     OFS=","
}

FNR==NR {
    for(i=2;i<=NF;i++) {
        a[$1][$i]
    }
    next
}

{
    $5 = "NOMATCH"
    for(j in a) {
        for (k in a[j]) {
            if (substr($2,0,6) == k) {
                $5 = j
            }
        }
    }
}1

Alternatively, here's the one-liner:

awk -F "[=,]" 'FNR==NR { for(i=2;i<=NF;i++) a[$1][$i]; next } { $5 = "NOMATCH"; for(j in a) for (k in a[j]) if (substr($2,0,6) == k) $5 = j }1' OFS=, file1.txt file2.txt

Results:

aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf

If you have an 'old' awk, try the following. Run like:

awk -f script.awk file1.txt file2.txt

Contents of script.awk:

BEGIN {
     # set the field separator to either an equals sign or a comma
     FS="[=,]"
     # set the output field separator to a comma
     OFS=","
}

# for the first file in the arguments list
FNR==NR {
    # loop through all the fields, starting at field two
    for(i=2;i<=NF;i++) {

        # add field one and each field to a pseudo-multidimensional array
        a[$1,$i]
    }

    # skip processing the rest of the code
    next
}


# for the second file in the arguments list
{
    # set the default value for field 5
    $5 = "NOMATCH"

    # loop though the array
    for(j in a) {

        # split the array keys into another array
        split(j,b,SUBSEP)

        # if the first six digits of field two equal the value stored in this array
        if (substr($2,0,6) == b[2]) {

            # assign field five 
            $5 = b[1]
        }
    }

# return true, therefore print by default
}1

Alternatively, here's the one-liner:

awk -F "[=,]" 'FNR==NR { for(i=2;i<=NF;i++) a[$1,$i]; next } { $5 = "NOMATCH"; for(j in a) { split(j,b,SUBSEP); if (substr($2,0,6) == b[2]) $5 = b[1] } }1' OFS=, file1.txt file2.txt

Results:

aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf

awk | update field number after comparing field from other file

2 Answers