I have two very large files in Unix each containing say 5 columns but millions of lines.
Ex :
File 1: abc|def|ghk|ijk|lmn .... ...
File2 : abc|def|ghk|ijk|123 ... ...
My task is to compare the two large files and find out the differing columns and rows . For instance the output would be : Column-no Row-no File1-word File2-word.
Ex :
5 1 lmn 123
The files are really large . I have heard awk is the fastest way to do file parsing in Unix. Since the output can't wait for too long.
Can this be done using awk?
awk
— though reading from two files concurrently is hard but saving all the input from one file and then using that while reading the second is a normal mode of operation forawk
scripts. What did you try, and where did you run into problems? If you can use Perl or Python, you'd find it easier to avoid slurping the whole of one file into memory. – Jonathan Lefflergetline < file2
for every line read from file1. I'm not saying that's the best approach of course, just that it's do-able. Subhayan - edit your question to include concise, testable sample input (e.g. a couple of files of 4 or 5 rows and 4 or 5 columns each) and expected output. – Ed Morton