0
votes

I have a tab-delimited file that looks like this:

2L <TAB> 440 <TAB> . <TAB> . <TAB> . <TAB> 1/1:49:42,6,0  
2L <TAB> 260 <TAB> 0/1:66:63,0,207 <TAB> . <TAB> . <TAB> 1/1:49:42,6,0
2L <TAB> 595 <TAB> 0/1:11:85,0,8 <TAB>0/1:13:132,0,10 <TAB>0/1:73:70,0,131<TAB> 0/1:59:72,0,56

In this example I only included 6 columns but the actual file itself contains 19 columns in total. How do I use awk to extract lines, so that every column starting from column 3 will have a content other than the dot (.) character? From the above example, I want to output the 3rd line because all 6 columns are not empty and do not have the dot character as their value.

I have tried a couple of commands such as the one below but it doesn't seem to work.

awk '$3-$19==0-9' input.txt > out.txt

Thanks in advance

3

3 Answers

2
votes

awk:

awk -F'\t' '{ for(i=3;i<=NF;i++)if($i ==".") next; print}' input.txt > out.txt

or

awk -F'\t' '!/\t\.\t/' input.txt > out.txt

sed:

sed '/\t\.\t/d' input.txt > out.txt 
1
votes

Not sure of any way to do it more elegantly, but this should work:

awk '$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19 !~ /\./ {print}'

That basically concatenates all the relevant fields and searches for a . in the result, and only prints if it doesn't match.

1
votes

Variant with sed:

sed '/\([^\t]*\t\)\{2\}.*\t\.\t/d' input.txt > out.txt