0
votes

I try to get the mean of the product (interaction) of two variables from a model fitted with lm().

N <- 1000
u <- rnorm(N)
x1 <- rnorm(N)
x2 <- 1 + x1 + rnorm(N)
y <- 1 + x1 + x2 + u
df <- data.frame(y,x1,x2)
fit <- lm(y ~ x1 * x2, data = df)

I can calculate the mean of a single variable for a coefficient accessing $model.

mean(fit$model[,2])
# verify result
mean(df[,2])

But how can I get the mean of the interaction without going back into the data.

# Result should be
mean(df$x1*df$x2)
2

2 Answers

2
votes

I'm not sure why you want this, but it is trivial to get from fit. First, it is best not to delve into fitted objects like this with $. Instead learn to use extractor functions. In this case, the equivalent of mean(fit$model[,2]) would be, for all columns of the data at once:

> colMeans(model.frame(fit))
        y        x1        x2 
2.0783225 0.0283555 1.0481141

The model frame is just a copy of the data. What you want is the design matrix, or as R calls it the model matrix, which, unsurprisingly is obtained using the model.matrix() function.

> head(model.matrix(fit))
  (Intercept)          x1          x2       x1:x2
1           1 -0.33406119  1.95054087 -0.65160001
2           1 -1.41848058  0.35429591 -0.50256186
3           1 -1.32877702 -0.00783884  0.01041607
4           1  0.54054637  1.34637056  0.72777572
5           1 -0.75686319 -0.36476471  0.27607699
6           1  0.04514449  1.62928315  0.07355316

Note that the response data aren't in the design matrix, but the interaction term is, in the final column. Use colMeans() again to get the mean of each column of this design matrix:

> colMeans(model.matrix(fit))
(Intercept)          x1          x2       x1:x2 
  1.0000000   0.0283555   1.0481141   1.0820110

For completeness I should show this is correct for my random data set:

> colMeans(transform(df[,-1], interaction = x1 * x2))
         x1          x2 interaction 
  0.0283555   1.0481141   1.0820110
0
votes
mean(x1 * x2)
#[1] 0.9009494

mean(do.call("*", fit$model[, c("x1", "x2")]))
#[1] 0.9009494

fit <- lm(y ~ x1 * x2, data = df, x=TRUE)
mean(fit$x[,"x1:x2"])
#[1] 0.9009494