I am a newbie to Unix and shell scripting. I am trying to find differences between 2 .csv files using Unix command. There are conditions on the basis of which I have to find the difference.
- Update to an entry i.e. any row in file1 (unique id is employee id) exists in file2 but different value for another column. It is considered as an update. In such case, I need that entry from file2
- If an entry exists in file2 and not in the file, It is considered as the addition of new employee. I need that row from file2.
- If an entry exists in file1 and not in the file2, It is considered as the deletion of an employee. I need that row from file1.
I am able to find the update and new records using comm -23 sorted_file_2.csv sorted_file_1.csv > updates.csv
but not able to find an entry that is deleted.
I have checked solutions using below commands
grep -v -x -f sorted_file_2.csv sorted_file_1.csv > deleted.csv
awk 'NR==FNR{a[$0]=1;next}!a[$0]' sorted_file_2.csv sorted_file_1.csv > deleted.csv
diff sorted_file_1.csv sorted_file_2.csv > deleted.csv
The above commands are always giving me entries that are updated as well as deleted. What I am looking for the only entry from file1 which is not in file2
P.S. The two file can contain all the 3 cases mentioned above. I need output in two csv files. One for update/new records and another one for deleted records.
File1.csv
Row|Employee_ID|Salary|Designation 1|John|2000|Clerk 2|Smith|3000|Supervisor 3|Jenny|1000|Intern 4|Vicky|5000|Manager
File2.csv
Row|Employee_ID|Salary|Designation 1|John|2000|Clerk 2|Smith|4000|Senior Supervisor 4|Vicky|5000|Manager 5|James|5000|Auditor
In the above 2 files Row #2 in file2 is an update, Row#5 is a new entry.Both of them can be combined in single file as
Update_new.csv
2|Smith|4000|Senior Supervisor
5|James|5000|Auditor
Deleted entry is row#3 in file1.csv which is not present in file2.csv to be kept in separate file deleted.csv
3|Jenny|1000|Intern
It is fine even if I am able to add all the two files in single file with one extra coloum specifiyng "UPDATED","NEW","DELETED" value.
diff
? – oliv