2
votes

i'm trying to match two columns in a dataframe to another dataframe, and I want the value returned to be the one in the second dataframe that first matches the two initial columns.

For example: I want to take the following dataframe:

Fasta<-c("X1","X1","X2","X2","X3","X3")
Species<-c("Kiwi","Chicken","Weta","Cricket","Tuatara","Gecko")
testdata<-as.data.frame(cbind(Fasta,Species))
testdata<-aggregate(Species ~ Fasta, testdata, I)
testdata<-aggregate(Species ~ Fasta, testdata, I)

Fasta    Species1 Species2

X1       Kiwi      Chicken
X2       Weta      Cricket
X3       Tuatara   Gecko

The following is my second dataframe

Species<-c("Kiwi","Chicken","Weta","Cricket","Frog","Gecko")
Genus<-c("Orn","Norn","Genus2","Genus2","Spec","NoSpec")
Order<-c("Bird","Bird","Order2","Order2","Norder","Geckn")
Kingdom<-rep("Animal",each=6)
lookup<-data.frame(cbind(Species,Genus,Order,Kingdom))

Species Genus   Order   Kingdom

Kiwi    Orn     Bird    Animal
Chicken Norn    Bird    Animal
Weta    Genus2  Order2  Animal
Cricket Genus2  Order2  Animal
Frog    Spec    Norder  Animal
Gecko   NoSpec  Geckn   Animal

I want to find the first column in the second dataframe that matches both Species1 and Species2 and return its name. Ideally this would give me the following output:

Fasta   Species1    Species2    MatchLevel

X1      Kiwi        Chicken     Order
X2      Weta        Cricket     Genus
X3      Tuatara     Gecko       Kingdom

Open to the data in different formats,

1
testdata$MatchLevel <- mapply(function(s1, s2){names(lookup)[which(unlist(lookup[s1 == lookup$Species, ]) == unlist(lookup[s2 == lookup$Species, ]))[1]]}, testdata$Species1, testdata$Species2), though I suspect there's a more elegant alternativealistaire

1 Answers

0
votes

This function takes advantage of the nestedness of the taxonomic groups (i.e., if two species are in the same genus, they must be in the same order, etc.). Two species in the same genus get a score of 3 because all 3 taxonomic levels match, 2 if in the same order, and 1 if in the same kingdom. No match is also possible.

match2species <- function(a, b, lookup_table = lookup) {
  sp_a <- lookup_table[lookup_table$Species == a, ]
  sp_b <- lookup_table[lookup_table$Species == b, ]

  matches <- sum(sp_a[-1] == sp_b[-1])
  ifelse(matches > 0, c('Kingdom','Order','Genus')[matches], 'No match')

}

The function can be called for any pair of species in your data frame.

> match2species('Chicken','Kiwi')
[1] "Order"
> match2species('Weta','Cricket')
[1] "Genus"
> match2species('Frog','Gecko')
[1] "Kingdom"