1
votes

Hello I have 2 csv files

File1.csv (4 columns)

3 T;3.0 T;269.8 M;/path1/
3 T;3.0 T;3.4 G;/path2/

File2.csv (5 columns)

10000;9459;206;335;/path1/
10000;8543;205;1252;/path2/

I want to produce an output.csv file from File1.csv and File2.csv only if path in File1.csv and File2.csv are equal

The output file produced should be :

10000;9459;3 T;3.0 T;206;335;269.8 M;/path1/
10000;8543;3 T;3.0 T;205;1252;3.4 G;/path2/

--> The condition is "value of Col4 in File1" = "value of col5 in File2" (f1_col4 = f2_col5)

--> The output.csv file produced should have 8 columns : f2_col1;f2_col2;f1_col1;f1_col2;f2_col3;f2_col4;f1_col3;f1_col4

How can i do this with awk please ?

2
Is there a reason you want to use awk for this? Would other solutions be acceptable? What have you tried so far, and what are you stuck on?Eric Renouf

2 Answers

1
votes
awk -F ';' -vOFS=';' 'NR==FNR{a[$NF]=$0;next}a[$NF]{split(a[$NF],t);print $1,$2,t[1],t[2],$3,t[3],$NF}' file1 file2
0
votes

If you want line by line matching perhaps this will do

$ paste -d';' file2 file1 
  | awk -F';' -v OFS=';' '$5==$NF{print $1,$2,$6,$7,$3,$4,$8,$9}'   

10000;9459;3 T;3.0 T;206;335;269.8 M;/path1/
10000;8543;3 T;3.0 T;205;1252;3.4 G;/path2/

this will print only if the corresponding lines in both files have a matching path. Also doesn't need to keep the lines in memory, might be important for large files.