2
votes

trying to answer this question somewhere else I hit a wall with ggplot(), geom_smooth() and, I think, environments.

I made a succinct example of the problem with an widely available data set and base function, hope it scales to the real problem with the actual dataset. The model specification is just to make a reprex, not meaningful in any way.

I want to plot 5 fit curves for 5 subsets of data. All 5 models use the same formula, but start parameters are specific to each subset, the min() of one variable and the max() of other variable. It should be relatively easy with geom_smooth() and aes(..., color = variable_thar_subsets). And it is if I set the parameters fixed in advance.

1. Works, but it's not what I want.

library(tidyverse)
msleep %>% 
ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
  geom_point() +
  geom_smooth(method = "nls", 
              se = FALSE, 
              formula = y ~ max_sleep_total * x / (min_sleep_rem + x),
              method.args = list(start = list(max_sleep_total = 19.9, 
                                              min_sleep_rem   = 1.88)))
#Produces warnings becasuse of missing values. Not the issue, I think.

I've been trying different approaches to make ggplot() -or stat_smooth(), I really don't know who calls the shots here and this could be the issue- calculate the start parameters in execution time, one for each subset. My first try was using the "internal" names of the variables, as defined in aes(...).

2. Variable names in aes() not working.

msleep %>% 
  ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
  geom_point() +
  geom_smooth(method = "nls", 
              se = FALSE, 
              formula = y ~ max_sleep_total * x / (min_sleep_rem + x),     #Here y = sleep_total
              method.args = list(start = list(max_sleep_total = max(y),    #Here isn't. 
                                              min_sleep_rem   = min(x))))
# Returns:

#Error in geom_smooth(method = "nls", se = FALSE, formula = y ~ max_sleep_total *  : 
#  objeto 'y' no encontrado
# object y not found

So, maybe if I call the column names in the data.frame it will work.

3. Column names in data =, not working.

msleep %>% 
  ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
  geom_point() +
  geom_smooth(method = "nlsLM", 
              se = FALSE, 
              formula = y ~ max_sleep_total * x / (min_sleep_rem + x),
              method.args = list(start = list(max_sleep_total = max(sleep_total), 
                                              min_sleep_rem   = min(sleep_rem))))
#Returns:
#Error in geom_smooth(method = "nlsLM", se = FALSE, formula = y ~ max_sleep_total *  : 
#  objeto 'sleep_total' no encontrado

4. Variables in the global environment, working but useless.

max_sleep_total <-c(0, 19.9)  # Assign in global environment
min_sleep_rem <- c(1.88, 10) #Two values to check that min and max works.

msleep %>% 
  ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
  geom_point() +
  geom_smooth(method = "nls", 
              se = FALSE, 
              formula = y ~ max_sleep_total * x / (min_sleep_rem + x),
              method.args = list(start = list(max_sleep_total = max(max_sleep_total), 
                                              min_sleep_rem   = min(min_sleep_rem)))) +
  labs(title = "Code block 4")

Returns this plot, the same of 1.

enter image description here

Again, not what I'm looking for. start = parameters are fixed, and I want a specific set of parameters for each subset of data.

The questions.

  • Is there a way to make the plot I want using geom_smooth() which I haven't tried?

I know I can fit each model in advance, predict(), and plot a geom_line() with each estimated value of y. It's cumbersome, because you have to create extra data points to get a smooth line, you have to data sets (points and lines), labeling becomes a manual affair and doen't answer the second question:

  • Is there a way to tell geom_smooth() to look for variables in the parent frame(*) in method.args(), as it does with the formula. i.e. make the second code works. It should be the obvious solution.

(*) I assume is the ggplot() function call, but with NSE not so sure.

Background.

Those two are nice and helped a lot, but don't solve the problem of subseting nor use the color = argument.

Fitting with ggplot2, geom_smooth and nls

how to use method="nlsLM" (in packages minpack.lm) in geom_smooth

Interesting rant about ggplot() and environments, but for a different problem.

Use of ggplot() within another function in R

Tried to throw environment = environment() to the wall and see if it sticks judiciously wrap my mind around environments and lexical scoping looking for a solution, to no avail.

1
Did my answer below work for you?shbrainard
I think your answer could be a useful workaround for this particular problem. Thanks a lot for that. My main issue here is understanding where ggplot2 is evaluating and how to get starter values whithin the ggplot2 call. I think is a more general problem. Thanks again and the best for you in 2021.mpaladino

1 Answers

0
votes

One possible solution that worked for me is to use the method outlined here:

https://douglas-watson.github.io/post/2018-09_exponential_curve_fitting/

p <- ggplot(data=data, aes(x=relation, y=cor, colour=k)) +
    geom_point() +
    geom_smooth(method = "nls", 
                se = FALSE, 
                data = data,
                formula = y ~ SSasymp(x, yf, y0, log_alpha))

This fits a somewhat different function:

y ~ yf + (y0 - yf) * exp(-alpha * t)

But perhaps there is a self-starting function out there that would work for you?