I would like to have the difference between 2 files. Want to find specifically at what column the difference has occured using Unix Scripting

Question

We are doing a data migration project migrating from Informix to Oracle. Now a requirement has come up to compare the data between Informix and oracle tables. So Table1 from Informix will be migrated to Table1 in Oracle.
Now data is migrated from Table1 in Informix to Table1 in Oracle. I am taking data from both the tables into 2 flat files.I want to just see if there is any data difference in 2 files.

File1 has say 1 million records with 300 columns and File2 say 1 million records with 300 columns. File1 and File2 are pipe delimited files. So difference can be anywhere in the file. Almost a whole row can match, but there can be difference in one/two columns. So I would like to find out the differences, keeping all the above things in mind. Difference output should list out something like row1 in file1 is not matching with row1 in file2 at column 3 and the value of the columns from both files. All such differences from both the files need to be listed out on the console with layman understandable format.

I want to just explain it with an example to make it more clear.

My first file if it looks like below:

col1|col2|col3|col4|col5|col6
1234|ramyakrishna|4567|[email protected]|228802|worse
1235|Chandan|4567|[email protected]|228862|worse
1236|Kacitha|4567|[email protected]|228872|worse
1238|Shajin|4567|[email protected]|228873|worse

My second file looks like this:

col1|col2|col3|col4|col5|col6
1234|ramyakrishna|4567|[email protected]|228802|good
1235|Chandan|4567|[email protected]|228789|worse
1236|Kacitha|4567|[email protected]|228872|worse

So when i compare both the files. We see that there is a difference in col4 and col6 in row2. So I want something like this:

1234|ramyakrishna|4567|[email protected]|228802|good
:[email protected]:COL6-EXPECTED-worse

So in above row from second file should be printed. Then expected values should be printed.

I know files need to be sorted before using any script.

For fge's script i am getting ouput like below:

I am getting ouput like below

$perl diff.perl    

Line 1: different value for column 38 (was g, expected w)
Line 1: different value for column 40 (was o, expected r)
Line 1: different value for column 41 (was d, expected s)
Line 1: different value for column 42 (was ., expected e)
Line 1: different value for column 43 (was c, expected .)
Line 1: different value for column 44 (was o, expected c)
Line 1: different value for column 45 (was m, expected o)
Line 1: different value for column 46 (was |, expected m)

I want whole column comparison. Files are | delimited files.

JRFerguson JRFerguson · Accepted Answer · 2012-01-04T20:14:21

Assuming the files are sorted in the same order, look at comm or diff. Be advised that this operates on the line-level. To drill down to differences at the field level you might begin with the subset population of those differing a the line-level.

I would like to have the difference between 2 files. Want to find specifically at what column the difference has occured using Unix Scripting

3 Answers