0
votes

I'm still in the process of learning R using Swirl and RStudio, and a goal I've set for myself is to recreate this graph. I have a small dataset that I will link below (it's saved as a plain text CSV file that I import into R with headings enabled).

If I try to plot that dataset without changing anything, I get this, which is obviously not the goal.

At first I thought the problem would be in the class of my imported dataset, defined as kt. After class(kt) turned out to be data.frame I figured that wasn't the problem. Should I be trying to rewrite the table to something that R can plot instantly, or should I be trying to extract each species individually, plot them separately and then combining the different plots into one graph? Perhaps there is something wrong with my dates, I know that R handles dates in a specific way. Maybe these solutions are not even needed and I'm just doing something stupidly simple wrong, but I can't find it myself.

Your help is much appreciated.

Dataset:

Species,week 0,week 1,week 2,week 3,week 4,week 5,week 6,week 7,week 8,week 9,week 10,week 11,week 12,week 13,week 14,week 15,week 16,week 17,week 18
Caesalpinia coriaria,0.0%,24.0%,28.0%,28.0%,32.0%,37.0%,40.0%,46.0%,52.0%,56.0%,63.0%,64.0%,68.0%,71.0%,72.0%,,,,
Coccoloba swartzii,0.0%,0.0%,1.0%,10.0%,19.0%,31.0%,33.0%,39.0%,43.0%,48.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,55.0%,
Cordia dentata,0.0%,5.0%,18.0%,21.0%,24.0%,26.0%,27.0%,30.0%,32.0%,32.0%,32.0%,32.0%,32.0%,32.0%,33.0%,33.0%,33.0%,34.0%,35.0%
Guaiacum officinale,0.0%,0.0%,0.0%,0.0%,4.0%,5.0%,5.0%,5.0%,7.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,,
Randia aculeata,0.0%,0.0%,0.0%,4.0%,13.0%,14.0%,18.0%,19.0%,21.0%,21.0%,21.0%,21.0%,21.0%,22.0%,22.0%,22.0%,22.0%,,
Schoepfia schreberi,0.0%,0.0%,0.0%,0.0%,0.0%,0.0%,1.0%,4.0%,8.0%,11.0%,13.0%,21.0%,21.0%,24.0%,24.0%,25.0%,27.0%,,
Prosopis juliflora,0.0%,7.5%,31.3%,34.2%,,,,,,,,,,,,,,,
1

1 Answers

1
votes

Something like this??

# get rid of "%" signs
df <- data.frame(sapply(df,function(x)gsub("%","",x,fixed=T)))
# convert cols 2:20 to numeric
df[,2:20] <- sapply(df[,2:20],function(x)as.numeric(as.character(x)))

library(reshape2)
library(ggplot2)
gg <- melt(df,id="Species")
ggplot(gg,aes(x=variable,y=value,color=Species,group=Species)) + 
  geom_line()+
  theme_bw()+
  theme(legend.position="bottom", legend.title=element_blank())

There are lots of problems here.

First, if your dataset really has those % signs, then R interprets the data as character and imports it as factors. So first we have to get rid of the % (using gsub(...), and then we have to convert what's left to numeric. With factors, you have to convert to character first, then numeric, so: as.numeric(as.character(...)). All of this could have been avoided if you exported the data without the % signs!!!

Plotting multiple curves with different colors is something the ggplot package was designed for (among many other things), so we use that. ggplot prefers data in "long" format - all the data in one column, with a second column distinguishing different datasets. Your data is in "wide" format - data in different columns. So we convert to long using melt(...) from the reshape2 package. The result, gg has three columns: Species, variable and value. value contains the actual data and variable contains the week number.

So now we create a ggplot object, setting the x-axis to the variable column, the y-axis to the value column, with color mapped to Species, and we tell ggplot to plot lines (using geom_line(...)).

The rest is to position the legend at the bottom, and turn off some of the ggplot default formatting.