1
votes

I'm trying to graph multiple nonlinear least squares regression in r in different colors based on the value of a variable. However, I also display the equation of the last one, and I would like the color in the nonlinear regression corresponding to the equation to be black as well.

What I've tried is shown in the geom_smooth() layer - I tried to include an ifelse() statement, but this doesn't work because of reasons described here: Different between colour argument and aes colour in ggplot2?

test <- function() {
  require(ggplot2)
  set.seed(1);

  master <- data.frame(matrix(NA_real_, nrow = 0, ncol = 3))

for( i in 1:5 ) {
   df <- data.frame(matrix(NA_real_, nrow = 50, ncol = 3))
   colnames(df) <- c("xdata", "ydata", "test")

   df$xdata = as.numeric(sample(1:100, size = nrow(df), replace = FALSE))
   df$ydata = as.numeric(sample(1:3, size = nrow(df), prob=c(.60, .25, .15), replace = TRUE))
   # browser()
   df$test = i

   master <- rbind(master, df)
 }

df <- master

last <- 5

# based on https://stackguides.com/questions/18305852/power-regression-in-r-similar-to-excel

power_eqn = function(df, start = list(a=300,b=1)) {
    m = nls(as.numeric(reorder(xdata,-ydata)) ~ a*ydata^b, start = start, data = df)
    # View(summary(m))
    # browser()
    # eq <- substitute(italic(hat(y)) == a  ~italic(x)^b*","~~italic(r)^2~"="~r2*","~~p~"="~italic(pvalue),
    eq <- substitute(italic(y) == a  ~italic(x)^b*","~~italic('se')~"="~se*","~~italic(p)~"="~pvalue,
                 list(a = format(coef(m)[1], digits = 6), # a
                      b = format(coef(m)[2], digits = 6), # b
                      # r2 = format(summary(m)$r.squared, digits = 3), 
                      se = format(summary(m)$parameters[2,'Std. Error'], digits = 6), # standard error
                      pvalue = format(summary(m)$coefficients[2,'Pr(>|t|)'], digits=6) )) # p value (based on t statistic)
   as.character(as.expression(eq))                 
 }

plot1 <- ggplot(df, aes(x = as.numeric(reorder(xdata,-ydata)), y = ydata ) ) + 
    geom_point(color="black", shape=1 ) + 
    # PROBLEM LINE
    stat_smooth(aes(color=ifelse(test==5, "black", test)), method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a =1,b=1)),se=FALSE, fullrange=TRUE) +
    geom_text(x = quantile(df$xdata)[4], y = max(df$ydata), label = power_eqn(df), parse = TRUE, size=4, color="black") + # make bigger? add border around?
    theme(legend.position = "none", axis.ticks.x = element_blank() ) + #, axis.title.x = "family number", axis.title.y = "number of languages" ) # axis.text.x = element_blank(), 
    labs( x = "xdata", y = "ydata", title="test" )
plot1
}

test()

This is the graph I got.

enter image description here

I would like the line corresponding to the points and equation to be black as well. Does anyone know how to do this?

I do not want to use a scale_fill_manual, etc., because my real data would have many, many more lines - unless the scale_fill_manual/etc. can be randomly generated.

1

1 Answers

2
votes

You could use scale_color_manual using a custom created palette where your level of interest (in your example where test equals 5) is set to black. Below I use palettes from RColorBrewer, extend them if necessary to the number of levels needed and sets the last color to black.

library(RColorBrewer) # provides several great palettes

createPalette <- function(n, colors = 'Greens') {
  max_colors <- brewer.pal.info[colors, ]$maxcolors # Get maximum colors in palette
  palette <- brewer.pal(min(max_colors, n), colors) # Get RColorBrewer palette
  if (n > max_colors) {
    palette <- colorRampPalette(palette)(n) # make it longer i n > max_colros
  }

  # assume that  n-th color should be black
  palette[n] <- "#000000"

  # return palette
  palette[1:n]
}

# create a palette with 5 levels using the Spectral palette
# change from 5 to the needed number of levels in your real data.
mypalette <- createPalette(5, 'Spectral') #  palettes from RColorBrewer

We can then use mypalette with scale_color_manual(values=mypalette) to color points and lines according to the test variable.

Please note that I have updated geom_point and stat_smooth to so that they use aes(color=as.factor(test)). I have also changed the call to power_eqn to only use data points where df$test==5. The black points, lines and equation should now be based on the same data.

plot1 <- ggplot(df, aes(x = as.numeric(reorder(xdata,-ydata)), y = ydata )) + 
  geom_point(aes(color=as.factor(test)), shape=1) + 
  stat_smooth(aes(color=as.factor(test)), method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a =1,b=1)),se=FALSE, fullrange=TRUE) +
  geom_text(x = quantile(df$xdata)[4], y = max(df$ydata), label = power_eqn(df[df$test == 5,]), parse = TRUE, size=4, color="black") +
  theme(legend.position = "none", axis.ticks.x = element_blank() ) + 
  labs( x = "xdata", y = "ydata", title="test" ) +
  scale_color_manual(values = mypalette)

plot1

See resulting figure here (not reputation enough to include them)

I hope you find my answer useful.