1
votes

My dataframe is as follows

sos     eos     dataset     site                year
171     280     PhenoCam    Pheno_alligator     2016
130     275     PhenoCam    Pheno_alligator     2017
149     277     PhenoCam    Pheno_harvard2      2016
156     259     Landsat     Landsat_alligator   2016
157     247     Landsat     Landsat_alligator   2017
134     294     Landsat     Landsat_harvard2    2016
154     286     MODIS       MODIS_alligator     2016

and the data continues with 1000+ entries. There are four datasets total in dataframe site_type, with three years (2016, 2017, 2018). I want to make a scatterplot in ggplot of PhenoCam vs Landsat, using their sos values but can't figure out how to properly set the aes to get PhenoCam sos as the y-axis values and Landsat sos as the x-axis values. This scatterplot will be used to show RMSE and R, so, for example, sos for Pheno_alligator year 2016 needs to plot against Landsat_alligator year 2016.

I know normally the code would be something like this

ggplot(site_type, aes(Landsat, PhenoCam)) +
geom_point()

but the fact that they are in the same column with multiple things going on is throwing me off. I will be making 6 scatterplots total, (PhenoCam vs each dataset for both sos and eos) but only need guidance on one. Thank you!

1
I think the issue in your dataframe is that if you want ot have a scatterplot of Landsat vs PhenoCam, you need to have a way to identify each point with a value for Landsat and a values for Phenocam. Is it the purpose of eos to provide an ID ? if so, you can reshape your data into a longer format df %>% pivot_wider(-eos, names_from = dataset, values_from = sos)dc37
@dc37 I added more detail, is this helpful? The purpose is to see how accurate the sos values of Landsat are compared to the actual sos given by PhenoCam.Maridee Weber

1 Answers

0
votes

So your values are paired based on the end of the name of the variable "site" (which in your example is either "alligator" or "harvard2") and the year.

What you can do is to reshape your dataset in order to obtain the following one:

library(tidyr)
library(dplyr)

df %>% rowwise() %>% mutate(site = unlist(strsplit(site,"_"))[2]) %>%
  select(-eos) %>%
  pivot_wider(names_from = dataset, values_from = sos)

# A tibble: 3 x 5
  site       year PhenoCam Landsat MODIS
  <chr>     <int>    <int>   <int> <int>
1 alligator  2016      171     156   154
2 alligator  2017      130     157    NA
3 harvard2   2016      149     134    NA

With this you can easily get your scatter plot by doing:

library(tidyr)
library(dplyr)
library(ggplot2)

df %>% rowwise() %>% mutate(site = unlist(strsplit(site,"_"))[2]) %>%
  select(-eos) %>%
  pivot_wider(names_from = dataset, values_from = sos) %>%
  ggplot(aes(x= Landsat, y = PhenoCam, color = site))+
  geom_point()

enter image description here

Does it answer your question ?