nonlinear least squares in R when data are a function of parameters to be estimated

Question

I'm currently migrating from matlab to R, and trying to find out if what I want to do is possible.

I want to estimate a non-linear model in R where the observations are US states. The wrinkle is that one of the independent variables is a state-level index over counties, calculated using a parameter to be estimated, i.e. the model looks like this:

log(Y_s) = log(phi) + log(f(theta, X_cs)) + u_s

where Y_s is a state-level variable and X_cs is a vector containing county-level observations of a variable within the state, and f() returns a scalar value of the index calculated for the state.

So far I've tried using R's nls function while transforming the data as it's passed to the function. Abstracting from the details of the index, a simpler version of the code looks like this:

library(dplyr)

state <- c("AK", "AK", "CA", "CA", "MA", "MA", "NY", "NY")
Y <- c(3, 3, 5, 5, 6, 6, 4, 4)
X <- c(4, 5, 2, 3, 3, 5, 3, 7)
Sample <- data.frame(state, Y, X)

f <- function(data, theta) {
  output <- data %>%
    group_by(state) %>%
    summarise(index = mean(X**theta),
              Y = mean(Y))
}

model <- nls(Y ~ log(phi) + log(index),
             data = f(Sample, theta),
             start = list(phi = exp(3), theta = 1.052))

This returns an error, telling me that the gradient is singular. My guess is it's because R can't see how the parameter theta should be used in the formula.

Is there a way to do this using nls? I know I could define the criterion function to be minimised manually, i.e. log(Y_s) - log(phi) - log(f(theta, X_cs)), and use a minimisation routine to estimate the parameter values. But I want to use the postestimation features of nls, like having a confidence interval for the parameter estimates. Any help much appreciated.

Roland Roland · Accepted Answer · 2017-03-28T12:20:39

Sorry, I refuse to install that ginormous meta package. Thus, I use base R:

state <- c("AK", "AK", "CA", "CA", "MA", "MA", "NY", "NY")
Y <- c(3, 3, 5, 5, 6, 6, 4, 4)
X <- c(4, 5, 2, 3, 3, 5, 3, 7)
Sample <- data.frame(state, Y, X)

f <- function(X, state, theta) {
  ave(X, state, FUN = function(x) mean(x^theta))
}

model <- nls(Y ~ log(phi) + log(f(X, state, theta)),
             data = Sample, weights = 1/ave(X, state, FUN = length),
             start = list(phi = exp(3), theta = 1.052))
summary(model)
#Formula: Y ~ log(phi) + log(f(X, state, theta))
#
#Parameters:
#      Estimate Std. Error t value Pr(>|t|)
#phi   2336.867   4521.510   0.517    0.624
#theta   -2.647      1.632  -1.622    0.156
#
#Residual standard error: 0.7791 on 6 degrees of freedom
#
#Number of iterations to convergence: 11 
#Achieved convergence tolerance: 3.722e-06

nonlinear least squares in R when data are a function of parameters to be estimated

1 Answers