0
votes

I am trying to modify a tab delimited file by replacing the value of 2 columns separated by ":" where there is a null value in one column:

Start with this file:

id1 id2 pos1    pos2
749 2847443 21  13517135
862 2821796 21  13571669
997     21  13636494
1095    2821826 21  13661335
1131        21  13678797

I would like to modify column 2 to get this:

id1 id2 pos1    pos2
749 2847443 21  13517135
862 2821796 21  13571669
997 21:13636494 21  13636494
1095    2821826 21  13661335
1131    21:13678797 21  13678797

The problem is also that there is no blank space (" ") indicating a null value. For now I have not even been able to substitute the value of column 2 with column 3 when it is null...

I have tried using sed:

sed -e 's/\t\t/\t$3\t/g' input.txt > output

Which works but it just substitutes the text '$3' and I can't find how to substitute the values of $3:$4 instead.

I have also tried awk:

awk 'BEGIN {
  IFS = OFS = "\t"
 }
{
    for (column = 2; column <= NF; ++column) {
    if ($column == "") {
        $column = $3
    }
 }    
 print 
}         
'
input.txt > output

But this does not work either (it does nothing actually neither using "" nor " "...)

Can you please help me? Thank you.

2

2 Answers

3
votes

checking null value is actually easy. but I didn't understand the requirement well. which column could be null? in your awk script, you have loop from $2->$NF, if there is null column, you didn't set to ":" separated values, but set with $3. what about $3 is null?

I assume that only $2 (column2) could be null, then the following awk line should do the job.

 awk -F'\t' -vOFS='\t' '!$2{$2=$3":"$4}1' file

test

<ff is your input file>

kent$  awk -F'\t' -vOFS='\t' '!$2{$2=$3":"$4}1' ff          
id1     id2     pos1    pos2
749     2847443 21      13517135
862     2821796 21      13571669
997     21:13636494     21      13636494
1095    2821826 21      13661335
1131    21:13678797     21      13678797

to see it clearly, we could pipe the output to column command:

kent$  awk -F'\t' -vOFS='\t' '!$2{$2=$3":"$4}1' ff|column -t
id1   id2          pos1  pos2
749   2847443      21    13517135
862   2821796      21    13571669
997   21:13636494  21    13636494
1095  2821826      21    13661335
1131  21:13678797  21    13678797

hope it is helpful for you.

2
votes

looking at the before/after text you posted

B:    997       21  13636494
A:    997   21:13636494 21  13636494

you want to replace the 2nd column if null
with the 3rd + '**:**' + 4th column, right ?

so go for this:

sed 's/\(.*\)\t\t\(.*\)\t\(.*\)/\1\t\2:\3\t\2\t\3/g' testfile

that is match the line which has

something tabtab something tab something

and replace it with

1st-column tab 3rd-column:4th-column tab 3rd-column tab 4th-column

example:

$ cat testfile
749 2847443 21  13517135
862 2821796 21  13571669
997     21  13636494
1095    2821826 21  13661335
1131        21  13678797

$ sed 's/\(.*\)\t\t\(.*\)\t\(.*\)/\1\t\2:\3\t\2\t\3/g' testfile
749 2847443 21  13517135
862 2821796 21  13571669
997 21:13636494 21  13636494
1095    2821826 21  13661335
1131    21:13678797 21  13678797

Note: this looks for a missing 2nd column only, as what you talked about

PS: if you think that answers your question, don't forget to mark it as the correct answer