1
votes

first question here!

I am a novice with R and sequence alignments in general.

I am trying to do a multiple sequence alignment on R (R'studio) using the MSA (multiple sequence alignment) package. I used the seqinr package to upload my fasta file into R. It only contains two sequences of the approximate same size - a template and a query. However, I am encountering an error that the user manual did not describe.

This is what I typed into the terminal

> query<-"file name"$"sequence within the file"

> template<-"file name"$"sequence within the file"

> msaClustalOmega(query,template)

and this is the result

"Error in checkInputSeq(inputSeqs) : The parameter inputSeq is not valid! Possible inputs are < character >, < XStringSet >, or a file."

The file names and sequences within the files do 'pop up' as options to select, so R recognizes they exist. I am not sure what is going on, and it is still very early in the MSA process.

Any help would be great, thank you!

1

1 Answers

0
votes

From what I understand you are using an algorithm that expects a Biostring, so that it knows that you are dealing with nucleotide data, or amino acid data. You are feeding it ascii characters which would work if the algorithms for the distances were done by eg. Hamming distance. The nucleotide substitution matrices are specific and why specific types of characters are permitted.

You need the library Biostrings, and msa. Here's a minimal working example inspired from what I think you are trying (can't tell what your files look like):

library("Biostrings")
library("msa")
dnaSet = DNAStringSet(c("AACCTT","CCGGTTTT","AAAGGGTTT"))
res = msa(dnaSet,method="ClustalOmega")
print(res)

The error you saw mentioning < XStringSet > is because it expects a type of it, which DNAStringSet is a subclass of and wants an object of it or of another subclass eg DNAString. That is why when it ran the function checkInputSeq(inputSeqs) it produced an error cause the type check failed.