0
votes

I'm trying to implement my own linear regression likelihood ratio test.

The test is where you take the sum of squares of a reduced model and the sum of squares of a full model and compare it to the F statistic.

However, I am having some trouble implementing the function, especially when dealing with dummy variables.

This is the dataset I am working with and testing the function on.

Here is the code so far: The function inputs are the setup matrix mat, the response matrix which has just one column, the indices (variables) being test, and the alpha value the test is at.

linear_regression_likelihood <- function(mat, response, indices, alpha) {
  mat <- as.matrix(mat)
  reduced <- mat[,c(1, indices)]

  q <- 1 #set q = 1 just to test on data

  p <- dim(mat)[2]
  n <- dim(mat)[1]

  f_stat <- qf(1-alpha, df1 = p-q, df2 = n-(p+1))
  beta_hat_full <- qr.solve(t(mat)%*%mat)%*%t(mat)%*%response
  y_hat_full <- mat%*%beta_hat_full
  SSRes_full <- t(response - y_hat_full)%*%(response-y_hat_full)


  beta_hat_red <- qr.solve(t(reduced)%*%reduced)%*%t(reduced)%*%response
  y_hat_red <- reduced%*%beta_hat_red
  SSRes_red <- t(response - y_hat_red)%*%(response-y_hat_red)



  s_2 <- (t(response - mat%*%beta_hat_full)%*%(response - mat%*%beta_hat_full))/(n-p+1)

  critical_value <- ((SSRes_red - SSRes_full)/(p-q))/s_2
  print(critical_value)
  if (critical_value > f_stat) {
    return ("Reject H0")
  }
  else {
    return ("Fail to Reject H0")
  }
}

Here is the setup code, where I setup the matrix in the correct format. Data is the read in CSV file.

data <- data[, 2:5]

mat <- data[, 2:4]


response <- data[, 1]


library(ade4)
df <-data.frame(mat$x3)
dummy <- acm.disjonctif(df)
dummy
mat <- cbind(1, mat[1:2], dummy)

linear_regression_likelihood(mat, response, 2:3, 0.05)

This is the error I keep getting.

Error in solve.default(as.matrix(c)) : system is computationally singular: reciprocal condition number = 1.63035e-18

I know it has to do with taking the inverse of the matrix after it is multiplied, but the function is unable to do so. I thought it may be due to the dummy variables having too small of values, but I am not sure of any other way to include the dummy variables.

The test I am doing is to check whether the factor variable x3 has any affect on the response y. The actual answer which I verified using the built in functions states that we fail to reject the null hypothesis.

1

1 Answers

1
votes

The error originates from line

beta_hat_full <- qr.solve(t(mat)%*%mat)%*%t(mat)%*%response

If you go through your function step-by-step you will see an error

Error in qr.solve(t(mat) %*% mat) : singular matrix 'a' in solve

The problem here is that your model matrix does not have full column rank, which translates to your regression coefficients not being unique. This is a result of the way you "dummyfied" x3. In order to ensure full rank, you need to remove one dummy column (or manually remove the intercept).

In the following example I remove the A column from dummy which means that resulting x3 coefficients measure the effect of a unit-change in B, C, and D against A.

# Read data
data <- read.csv("data_hw5.csv")
data <- data[, 2:5]

# Extract predictor and response data    
mat <- data[, 2:4]
response <- data[, 1]

# Dummify categorical predictor x3
library(ade4)
df <-data.frame(mat$x3)
dummy <- acm.disjonctif(df)
dummy <- dummy[, -1]    # Remove A to have A as baseline
mat <- cbind(1, mat[1:2], dummy)

# Apply linear_regression_likelihood
linear_regression_likelihood(mat, response, 2:3, 0.05);
#          [,1]
#[1,] 8.291975
#[1] "Reject H0"

A note

The error could have been avoided if you had used base R's function model.matrix which ensures full rank when "dummyfying" categorical variables (model.matrix is also implicitly called in lm and glm to deal with categorical, i.e. factor variables).

Take a look at

mm <- model.matrix(y ~ x1 + x2 + x3, data = data)  

which by default omits the first level of factor variable x3. mm is identical to mat after (correct) "dummification".