1
votes

I have a data file of 1000 to 2000 columns and more than 3000 rows.

Input example Data

GO:0009987 Os760  Os840  Os550  Os380  Os590  Os340
GO:0043170 Os610  Os043  Os035

Expected Output:

GO:0009987 Os760  
GO:0009987 Os840  
GO:0009987 Os550  
GO:0009987 Os380  
GO:0009987 Os590
GO:0009987 Os340
GO:0043170 Os610
GO:0043170 Os043 
GO:0043170 Os035

I tried this:

sed 's/ /\n/2; P; D' filename | awk 'NF==2 {a =$1;b=$2; print; next} {print a,$0}'

But this give me result like this. (with one extra GO value in column 1). I want to remove this extra GO from the file.

GO:0009987 Os760  
GO:0009987 Os840  
GO:0009987 Os550  
GO:0009987 Os380  
GO:0009987 Os590
GO:0009987 Os340
GO:0009987
GO:0043170 Os610
GO:0043170 Os043 
GO:0043170 Os035
GO:0043170
2

2 Answers

3
votes

Could you please try following(changed delimited selection as per Sundeep sir's comments).

awk '{for(i=2;i<=NF;i++){print $1,$i}}' Input_file

OR try:

awk 'BEGIN{FS=":| +"} {for(i=3;i<=NF;i++){print $1":"$2,$i}}' Input_file

OR:

awk -F':| +' '{for(i=3;i<=NF;i++){print $1":"$2,$i}}' Input_file
1
votes

I notice the small mistake in my input file. Otherwiswe it also work fine:

sed 's/ /\n/2; P; D' filename | awk 'NF==2 {a =$1;b=$2; print; next} {print a,$0}'