As I noted in a comment, there's a lot of unstated information needed to give a definitive answer. However, we can make some plausible guesses:
- The common number is the first 6 digits of file 2 (we don't care about the trailing digits, but will simply copy them to the output).
- The files are sorted in order.
- If there are unmatched records in either file, those records will be ignored.
The tools of choice are probably sed
and join
:
sed 's/^\([0-9]\{6\}\)/\1,\1/' file2.txt |
join -t, -o 1.2,2.2,2.3 - file1.txt
This edits file2.txt
to create a comma-separated first field with the 6-digit phone number followed by all the rest of the line. The input is fed to the join
command, which joins on the first column, and outputs the 'rest of the line' (column 2) from file2.txt
and columns 2 and 3 from file1.txt
.
If the phone numbers are variable length, then the matching operation is horribly complex. For that, I'd drop into Perl (or Python) to do the work. If the data is unsorted, it can be sorted before being fed into the commands. If you want unmatched records, you can specify how to handle those in the options to join
.
The extra information needed is now available. The key information is the 6-digits is fixed — phew! Since you're on Linux, I'm assuming bash
is available with 'process substitution':
sort file2.txt |
sed 's/^\([0-9]\{6\}\)/\1,\1/' |
join -t, -o 1.2,2.2,2.3 -a 1 -a 2 -e 'no-match' - <(sort file1.txt)
If process substitution is not available, simply sort file1.txt
in situ:
sort -o file1.txt file1.txt
Then use file1.txt
in place of <(sort file1.txt)
.
I think the comment might be asking for inputs such as:
file1.txt
919167,hutch,mumbai
919594,idea,mumbai
902130,airtel,karnataka
file2.txt
919167000000
919594000000
919342313242
Output
no-match,airtel,karnataka
919167000000,hutch,mumbai
919342313242,no-match,no-match
919594000000,idea,mumbai
If that's not what the comment is about, please clarify by editing the question to add the extra data and output in a more readable format than comments allow.
Working with the extended data, this mildly modified command:
sort file2.txt |
sed 's/^\([0-9]\{6\}\)/\1,\1/' |
join -t, -o 1.2,2.2,2.3 -a 1 -e 'no-match' - <(sort file1.txt)
produces the output:
919167838888,hutch,mumbai
919212334323,no-match,no-match
919594998484,idea,mumbai
which looks rather like a sorted version of the desired output. The -a n
options control whether the unmatched records from file 1 or file 2 (or both) are printed; the -e
option controls the value printed for the unmatched fields. All of this is readily available from the man pages for join
, of course.