1
votes

I have run a series of multiple linear regression models and am running diagnostic plots using the method and code found via this link (http://www.r-bloggers.com/checking-glm-model-assumptions-in-r/)

I have no more than 53 data points for every model, however some of the outliers in the regression plots are labeled as above 53... ranging from 58-107. Do the labels of outliers or influential points in the regression plots not correlate with each individual data point? If so what do the labels mean and how do I know which of my data points are the outliers? I have counted my data points in my plots and none of them have more than 53.

I have attached a screenshot of my regression plot output. There are 53 points in this plot, however two of the notable points are labeled 90 and 106. Regression plot example

enter image description here

1
When asking questions, it helps to include reproducible examples so we can verify what's wrong rather than guess. Did you subset your data at some point before fitting the model? The labels may be the row names of the data.frame and not the row indexes.MrFlick

1 Answers

1
votes

plot.lm labels the points with the corresponding row names:

set.seed(42)
DF <- data.frame(x = 1:5, y = 2 + 3 * 1:5 + rnorm(5))
rownames(DF) <- letters[1:5]
DF$y[3] <- 1e3

mod <- lm(y ~ x, data = DF)
par(mfrow = c(2,2))
plot(mod, 1:4)

resulting plot