trying to answer this question somewhere else I hit a wall with ggplot()
, geom_smooth()
and, I think, environments.
I made a succinct example of the problem with an widely available data set and base function, hope it scales to the real problem with the actual dataset. The model specification is just to make a reprex, not meaningful in any way.
I want to plot 5 fit curves for 5 subsets of data. All 5 models use the same formula, but start parameters are specific to each subset, the min()
of one variable and the max()
of other variable.
It should be relatively easy with geom_smooth()
and aes(..., color = variable_thar_subsets
). And it is if I set the parameters fixed in advance.
1. Works, but it's not what I want.
library(tidyverse)
msleep %>%
ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point() +
geom_smooth(method = "nls",
se = FALSE,
formula = y ~ max_sleep_total * x / (min_sleep_rem + x),
method.args = list(start = list(max_sleep_total = 19.9,
min_sleep_rem = 1.88)))
#Produces warnings becasuse of missing values. Not the issue, I think.
I've been trying different approaches to make ggplot()
-or stat_smooth()
, I really don't know who calls the shots here and this could be the issue- calculate the start parameters in execution time, one for each subset. My first try was using the "internal" names of the variables, as defined in aes(...)
.
2. Variable names in aes()
not working.
msleep %>%
ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point() +
geom_smooth(method = "nls",
se = FALSE,
formula = y ~ max_sleep_total * x / (min_sleep_rem + x), #Here y = sleep_total
method.args = list(start = list(max_sleep_total = max(y), #Here isn't.
min_sleep_rem = min(x))))
# Returns:
#Error in geom_smooth(method = "nls", se = FALSE, formula = y ~ max_sleep_total * :
# objeto 'y' no encontrado
# object y not found
So, maybe if I call the column names in the data.frame
it will work.
3. Column names in data =
, not working.
msleep %>%
ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point() +
geom_smooth(method = "nlsLM",
se = FALSE,
formula = y ~ max_sleep_total * x / (min_sleep_rem + x),
method.args = list(start = list(max_sleep_total = max(sleep_total),
min_sleep_rem = min(sleep_rem))))
#Returns:
#Error in geom_smooth(method = "nlsLM", se = FALSE, formula = y ~ max_sleep_total * :
# objeto 'sleep_total' no encontrado
4. Variables in the global environment, working but useless.
max_sleep_total <-c(0, 19.9) # Assign in global environment
min_sleep_rem <- c(1.88, 10) #Two values to check that min and max works.
msleep %>%
ggplot(aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point() +
geom_smooth(method = "nls",
se = FALSE,
formula = y ~ max_sleep_total * x / (min_sleep_rem + x),
method.args = list(start = list(max_sleep_total = max(max_sleep_total),
min_sleep_rem = min(min_sleep_rem)))) +
labs(title = "Code block 4")
Returns this plot, the same of 1.
Again, not what I'm looking for. start =
parameters are fixed, and I want a specific set of parameters for each subset of data.
The questions.
- Is there a way to make the plot I want using
geom_smooth()
which I haven't tried?
I know I can fit each model in advance, predict()
, and plot a geom_line()
with each estimated value of y
. It's cumbersome, because you have to create extra data points to get a smooth line, you have to data sets (points and lines), labeling becomes a manual affair and doen't answer the second question:
- Is there a way to tell
geom_smooth()
to look for variables in the parent frame(*) inmethod.args()
, as it does with the formula. i.e. make the second code works. It should be the obvious solution.
(*) I assume is the ggplot()
function call, but with NSE not so sure.
Background.
Those two are nice and helped a lot, but don't solve the problem of subseting nor use the color =
argument.
Fitting with ggplot2, geom_smooth and nls
how to use method="nlsLM" (in packages minpack.lm) in geom_smooth
Interesting rant about ggplot()
and environments, but for a different problem.
Use of ggplot() within another function in R
Tried to throw judiciously wrap my mind around environments and lexical scoping looking for a solution, to no avail. environment = environment()
to the wall and see if it sticks
ggplot2
is evaluating and how to get starter values whithin theggplot2
call. I think is a more general problem. Thanks again and the best for you in 2021. – mpaladino