0
votes
library(tidyverse)

df <- data.frame(hour=c(5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23),
                 total=c(15507,132129,156909,81306,44413,51448,55308,63542,57564,54031,70319,53345,35137,15509,20134,5183,2554,20,203))


plot(df$hour, df$total)

fit1 <- lm(total~hour, data = df)
fit2 <- lm(total~poly(hour,2, raw = TRUE), data = df)
fit3 <- lm(total~poly(hour,3, raw = TRUE), data = df)
fit4 <- lm(total~poly(hour,4, raw = TRUE), data = df)
fit5 <- lm(total~poly(hour,5, raw = TRUE), data = df)

summary(fit1)$adj.r.squared
summary(fit2)$adj.r.squared
summary(fit3)$adj.r.squared
summary(fit4)$adj.r.squared
summary(fit5)$adj.r.squared

How do I determine the best fit for regression for my data

How can I calculate for critical points, global maxima and local maxima if any.

Tried using the adjusted r squares as the basis for selection of the best curve but my critical point do not correlate with the curve.

1
Generally R^2 goes up as variance goes up, so if you use a degree of 50 or so you'll get an R^2 close to 1. But that doesn't mean it's a great model! What's good depends on what you're trying to do. If you just want a line that fits nicely, GAMs are nice, or LOESS since this is timeseries, e.g. ggplot(df, aes(hour, total)) + geom_point() + geom_smooth(method = 'gam') is arguably not bad.alistaire
@alistaire, I agree. My answer covers how to do what the OP is asking, not whether it's the best approach or not (note that they are looking at adjusted R^2, not total, so not that bad. But I agree that a GAM might be better ...Ben Bolker

1 Answers

2
votes

"Best fit" is a question with a variety of answers depending on your goals, but:

AIC(fit1,fit2,fit3,fit4,fit5)
     df      AIC
fit1  3 450.4892
fit2  4 451.8506
fit3  5 453.3828
fit4  6 454.5851
fit5  7 446.4370

suggests that fit5 is the best (lowest AIC). bbmle::AICtab() gives a slightly more useful output (perhaps) — displays only AIC relative to best fit, sorts models by goodness of fit.

bbmle::AICtab(fit1,fit2,fit3,fit4,fit5)
     dAIC df
fit5 0.0  7 
fit1 4.1  3 
fit2 5.4  4 
fit3 6.9  5 
fit4 8.1  6 

If your model is

beta0 + beta1*x + beta2*x^2 + beta3*x^3 ...

then the first derivative is

beta1 + 2*beta2*x + 3*beta3*x^2 + ...

Finding the roots of this polynomial should give the critical points.

so e.g.

pp <- polyroot(coef(fit5)[-1]*(1:5))

Should give you the critical points for fit5.

png("tmp.png")
par(las=1, bty="l")
plot(df$hour, df$total)
lines(df$hour, predict(fit5))
abline(v=Re(pp), col =2 )
dev.off()

points and predicted line with critical points marked

A little more experimentation suggests that you haven't reached the optimal complexity yet. Using AICc ("corrected" AIC, which accounts for finite sample size):

bbmle::AICctab(fit5, fit7, fit10)
      dAICc df
fit7   0.0  9 
fit5  19.1  7 
fit10 29.3  12

i.e. order-7 is (considerably) better than order-5, but worse than order-10 ...