DNA Matching in Prolog

Question

I am attempting to learn basic Prolog. I have read some basic tutorials on the basic structures of lists, variables, and if/and logic. A project I am attempting to do to help learn some of this is to match DNA sequences.

Essentially I want it to match reverse compliments of DNA sequences.

Example outputs can be seen below:

?- dnamatch([t, t, a, c],[g, t, a, a]).
true

While it's most likely relatively simple, being newer to Prolog I am currently figuring it out.

I started by defining basic matching rules for the DNA pairs:

pair(a,t).
pair(g,c).
etc...

I was then going to try to implement this into lists somehow, but am unsure how to make this logic apply to longer lists of sequences. I am unsure if my attempted start is even the correct approach. Any help would be appreciated.

tas tas · Accepted Answer · 2017-04-22T23:20:10

Since your relation is describing lists, you could opt to use DCGs. You can describe the complementary nucleobases like so:

complementary(t) -->    % thymine is complementary to
  [a].                  % adenine
complementary(a) -->    % adenine is complementary to
  [t].                  % thymine
complementary(g) -->    % guanine is complementary to
  [c].                  % cytosine
complementary(c) -->    % cytosine is complementary to
  [g].                  % guanine

This corresponds to your predicate pair/2. To describe a bonding sequence in reverse order you can proceed like so:

bond([]) -->            % the empty sequence
  [].                   % doesn't bond
bond([A|As]) -->        % the sequence [A|As] bonds with
  bond(As),             % a bonding sequence to As (in reverse order)
  complementary(A).     % followed by the complementary nucleobase of A

The reverse order is achieved by writing the recursive goal first and then the goal that describes the complementary nucleobase to the one in the head of the list. You can query this using phrase/2 like so:

   ?- phrase(bond([t,t,a,c]),S).
S = [g,t,a,a]

Or you can use a wrapper predicate with a single goal containing phrase/2:

seq_complseq(D,M) :-
  phrase(bond(D),M).

And then query it:

   ?- seq_complseq([t,t,a,c],C).
C = [g,t,a,a]

I find the description of lists with DCGs easier to read than the corresponding predicate version. Of course, describing a complementary sequence in reverse order is a relatively easy task. But once you want to describe more complex structures like, say the cloverleaf structure of tRNA DCGs come in real handy.

DNA Matching in Prolog

3 Answers