Machine learning, decision tree

Question

I have a question about machine learning and decision tree. I work in computational biology (long RNA secondary structure prediction).

I have a program which predicts the accuracy of a predicted RNA secondary structure. The input argument to the program are

stem length (L) - values from 3,4,5,6,7 and 8
gap size (G) - values from from 0,1,2,3,4,5,6,7,and 8
chunk length (c) - values from from 60,70,80,90,100,120,130,140, and 150

I want to know, for a given RNA sequence of length (S), which L,G,C combination gives a maximum accuracy.

I have a training data set of 50 sequence files with sequence lengths S and for each these sequence files, the L,G,C input parameter combinations which gives maximum accuracy output are already known.

Is there a way that we can know which specific L, G, and C parameters to use in order to find maximum accuracy with out all the L,G, and C range values?

Andrew Tomazos Andrew Tomazos · Accepted Answer · 2013-04-27T14:00:21

Your problem statement is not very clear.

You want a supervised learning algorithm that learns from your 50 training examples and creates a predictor program that takes as input a "sequence file" and produces as output values of L, G and C for that sequence file.

Is that correct?

There are many choices for supervised learning algorithms. What exactly is the data in the sequence file? Is it a vector of real numbers? What structure does it have? If you had to determine L, G and C "by hand" for a sequence file could you do it? How would you do it?

Machine learning, decision tree

2 Answers