Problem:
I'm trying to import a newick format phylogenetic tree, I've done this before, (a tree made in the same way, so the code works!) however the tree appears to be the problem. I'm getting a duplicate tip labels error. If that is the case, is there a way to easily remove duplicate tips in R?
Current code:
library(ape)
library(geiger)
library(caper)
taxatree <- read.tree("test2.tre")
sumdata <- read.csv("ogtprop.csv")
sumdataPGLS <-data.frame(A=sumdata$A,OGT=sumdata$OGT, Species=sumdata$Species)
sumdataPGLS$Species<-gsub(" ", "_", sumdata$Species)
#this line inserts an underscore between species and genus in my dataframe, (as the tree is formatted like this)
comp.dat <- comparative.data(taxatree, sumdataPGLS, "Species")
I get the follow error after the last line:
Error in comparative.data(taxatree, sumdataPGLS, "Species") :
Duplicate tip labels present in phylogeny
Suggesting the problem is purely with the phylogeny, not the dataframe.
Desired outcome:
A way to remove duplicate tip labels in R
Input data:
Unfortunately the tree is so large, I can't put it all in here, however here is a subset of the data (note, this will not work by itself), I am presenting it here in-case there are any systematic errors which are obvious to others:
(((('Acidilobus_saccharovorans':4,'Caldisphaera_lagunensis':4)Acidilobales:4,
('Sulfurisphaera_tokodaii':4,('Metallosphaera_hakonensis':4,
'Metallosphaera_sedula':4)Metallosphaera:4,('Acidianus_sulfidivorans':4,
'Acidianus_brierleyi':4)Acidianus:4,('Sulfolobus_metallicus':4,
'Sulfolobus_solfataricus':4,'Sulfolobus_acidocaldarius':4)Sulfolobus:4)
Sulfolobaceae:4,(('Pyrolobus_fumarii':4,'Hyperthermus_butylicus':4,
'Pyrodictium_occultum':4)Pyrodictiaceae:4,('Aeropyrum_camini':4,
('Ignicoccus_hospitalis':4,'Ignicoccus_islandicus':4)Ignicoccus:4,