2
votes

I'm running a cross-validation algorithm to find the best polynomial fit for data that changes from day to day. I want to find a non-cumbersome method of displaying the fit in a simple plot without having to manually write the whole regression formula and the beta coefficients for the plotting each time. For the regression formula the solve is easy, I create a string using sprintf and use as.formula() on the string .

The problem is plotting the line. I create a string in the same way, but the as.formula() function seems only to work for regression formulas, not formulas on the form "beta + beta*t". I have also tried using eval(parse()) as shown below, but this only creates a vector of NA's.

#Create strings
poly_form = "y ~ t"
beta_form = "beta[1]"
for (i in 1:pmin) {  #pmin is the best polynomial fit, e.g. 4 or 9.
           poly_form <- sprintf("%s + I(t^%s)", poly_form, i)
           beta_form <- sprintf("%s + beta[%s]*t^%s",beta_form, i+1, i)
            }

#Regression
poly.mod = lm(as.formula(poly_form))
beta = coef(poly.mod)

#Plot
plot(t, y, type = 'h')
lines(t, eval(parse(text = beta_form))) #This doesn't work.

So in essence, how can I use the string I've created as part of an input into the lines function in a way that automatically produces the same output as this:

lines(t, beta2[1] + beta2[2]*t + beta2[3]*t^2 + beta2[4]*t^3 + beta2[5]*t^4 + beta2[6]*t^5 + beta2[7]*t^6) 

3
I would suggest using a penalized smoother instead of a polynomial (see package mgcv).Roland

3 Answers

4
votes

That's not how you do this.

First, use the poly function. Second, use predict.

set.seed(42)
y <- rnorm(10)
t <- 1:10

DF <- data.frame(y, t) #important!

pmin <- 3

poly.mod <- lm(y ~ poly(t, degree = pmin, raw = TRUE), data = DF)

plot(t, y, type = 'h')
curve(predict(poly.mod, newdata = data.frame(t = x)), add = TRUE)

resulting plot

curve evaluates the expression passed to its first parameter. x denotes the x-values of the plot. It always has to be x!

1
votes

I think Roland's approach is better here, but it's always nice to get an explanation as to why your own code wasn't working.

Let's make this concrete with some dummy data so we can see where the problem lies:

set.seed(69)
t <- 1:100
y <- 3 + 0.3 * t + 0.01*t^2 + 0.0002*t^3 + 4e-6*t^4 + 
     3e-10*t^5 + 4e-16*t^6 + rnorm(100, 0, 50)

plot(t, y)

enter image description here

Now let's imagine we've decided to fit a degree six polynomial regression:

pmin <- 6
poly_form = "y ~ t"
beta_form = "beta[1]"
for (i in 1:pmin) {  #pmin is the best polynomial fit, e.g. 4 or 9.
           poly_form <- sprintf("%s + I(t^%s)", poly_form, i)
           beta_form <- sprintf("%s + beta[%s]*t^%s",beta_form, i+1, i)
            }

So far, so good. Now let's look at our poly form and beta form:

poly_form
#> [1] "y ~ t + I(t^1) + I(t^2) + I(t^3) + I(t^4) + I(t^5) + I(t^6)"
beta_form
# > [1] "beta[1] + beta[2]*t^1 + beta[3]*t^2 + beta[4]*t^3 + beta[5]*t^4 + 
         beta[6]*t^5 + beta[7]*t^6"

There's a bit of a problem here. We're including terms for t and terms for t^1 in our regression. These are of course the same thing. So if we create poly_mod we get:

poly.mod = lm(as.formula(poly_form))
poly.mod

#> Call:
#> lm(formula = as.formula(poly_form))
#>
#> Coefficients:
#> (Intercept)            t       I(t^1)       I(t^2)       I(t^3)       I(t^4)  
#> -1.910e+00   -2.444e-01           NA   -4.095e-02    5.933e-03   -1.499e-04  
#>      I(t^5)       I(t^6)  
#>   1.611e-06   -5.903e-09  

You can see that we get an NA for I(t^1). However, that means that coef(poly.mod) will now contain an NA:

beta = coef(poly.mod)
beta
#>   (Intercept)             t        I(t^1)        I(t^2)        I(t^3)        I(t^4) 
#>  8.139958e+01 -1.494928e+01            NA  1.037905e+00 -3.454374e-02  6.267641e-04 
#>        I(t^5)        I(t^6) 
#> -5.534399e-06  1.904566e-08 

This means that when we parse beta_form, there will always be an NA in the sum, so it will just produce a vector of NA:

eval(parse(text = beta_form))
#>  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> [28] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> [55] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#> [82] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

So what's the solution?

Simply change your original poly_form = "y ~ t" to poly_form = "y ~ ".

Now you run the rest of your code as is, and you get the desired result:

plot(t, y, type = 'h')
lines(t, eval(parse(text = beta_form))) 

enter image description here

0
votes

use poly():

model = lm(y ~ poly(t, 4, raw = TRUE, data = df)
beta = coef(model)
t = t0 ^ (0:4)
sum(beta * t)    

# or
predict(model, newdata)   # dataframe of t