Input File1: file1.txt
MH=919767,918975
DL=919922
HR=919891,919394,919812
KR=919999,918888Input File2: file2.txt
aec,919922783456,a5,b3,,,asf
abc,918975583456,a1,b1,,,abf
aeci,919998546783,a2,b4,,,wsfOutput File
aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsfNotes
- Need to compare phone number (Input file2.txt - 2nd field - initial 6 digit only) within Input file1.txt - 2nd field with "=" separted). If there is match in intial 6 digit of phone number, then OUTPUT should contain 2 digit code from file (Input file1) into output in 5th field
- File1.txt is having single code (for example MH) for mupltiple phone number intials.
0
votes
How much data do you have? Can one of the files fit in memory? Any specific reason you want to use awk only and not perl/python?
– Hari Menon
Data can be of approx 100MB in a file
– Vipin Choudhary
2 Answers
1
votes
If you have GNU awk
, try the following. Run like:
awk -f script.awk file1.txt file2.txt
Contents of script.awk
:
BEGIN {
FS="[=,]"
OFS=","
}
FNR==NR {
for(i=2;i<=NF;i++) {
a[$1][$i]
}
next
}
{
$5 = "NOMATCH"
for(j in a) {
for (k in a[j]) {
if (substr($2,0,6) == k) {
$5 = j
}
}
}
}1
Alternatively, here's the one-liner:
awk -F "[=,]" 'FNR==NR { for(i=2;i<=NF;i++) a[$1][$i]; next } { $5 = "NOMATCH"; for(j in a) for (k in a[j]) if (substr($2,0,6) == k) $5 = j }1' OFS=, file1.txt file2.txt
Results:
aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf
If you have an 'old' awk
, try the following. Run like:
awk -f script.awk file1.txt file2.txt
Contents of script.awk:
BEGIN {
# set the field separator to either an equals sign or a comma
FS="[=,]"
# set the output field separator to a comma
OFS=","
}
# for the first file in the arguments list
FNR==NR {
# loop through all the fields, starting at field two
for(i=2;i<=NF;i++) {
# add field one and each field to a pseudo-multidimensional array
a[$1,$i]
}
# skip processing the rest of the code
next
}
# for the second file in the arguments list
{
# set the default value for field 5
$5 = "NOMATCH"
# loop though the array
for(j in a) {
# split the array keys into another array
split(j,b,SUBSEP)
# if the first six digits of field two equal the value stored in this array
if (substr($2,0,6) == b[2]) {
# assign field five
$5 = b[1]
}
}
# return true, therefore print by default
}1
Alternatively, here's the one-liner:
awk -F "[=,]" 'FNR==NR { for(i=2;i<=NF;i++) a[$1,$i]; next } { $5 = "NOMATCH"; for(j in a) { split(j,b,SUBSEP); if (substr($2,0,6) == b[2]) $5 = b[1] } }1' OFS=, file1.txt file2.txt
Results:
aec,919922783456,a5,b3,DL,,asf
abc,918975583456,a1,b1,MH,,abf
aeci,919998546783,a2,b4,NOMATCH,,wsf