I have a problem with confidence intervals and predictions.
I have a data set (called 'data') consisting of 158 observations in 2 different variables, S and N, though for some observations N is not available. I have been able to plot a regression line and 95% confidence intervals using qplot. So far so good. Now, I have a second, completely different, data set (called 'data2') with 127 observations of N and would like to know which S this corresponds to and what the confidence intervals are for these S-values. I can't seem to predict these values. Maybe someone could help me out here?
This is what I tried:
data.lm = lm(data$S~data$N)
newdata = data.frame(data2$N)
predict(data.lm, newdata, interval=c("confidence"))
This gives me a warning message
Warning message:
'data2' had 127 rows but variables found have 158 rows
and it gives 158 rows of fit, upper and lower values but they obviously don't belong to my data2 N values.
fit lwr upr
1 37.88919 37.66022 38.11816
2 38.38123 38.23795 38.52451
3 NA NA NA
4 37.59720 37.26820 37.92621
5 38.09655 37.92488 38.26823
6 37.77301 37.50590 38.04012
...
Same problem when I try specific values such as
data.lm = lm(data$S~data$N)
newdata = data.frame(N=5)
predict(data.lm, newdata, interval=c("confidence"))
it gives me a warning and the exact same output.
I'm probably being stupid here, but I found a lot of similar questions, and the solution always seemed to be exaclty what I tried.. Why does predict not give me one row of fit, upr and lwr but instead seems to do something to the data the lm is based on?
Thank you very much in advance
EDIT:
The data I used:
structure(list(S = c(36.7735, 36.7735, 36.7735, 36.7735, 36.7735,
36.7735, 36.7735, 36.7735, 36.7735, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307,
37.307, 37.307, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525,
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.766,
38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 38.766,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639,
39.639), N = c(7.740086957, 9.716043478, NA, 6.567521739, 8.572826087,
7.273521739, 8.689478261, NA, 8.112565217, 9.370289089, 8.429912766,
9.178733143, 8.136725442, 9.127494831, 7.91849608, 8.775866462,
8.733992185, 8.47272603, 8.700879331, 9.57630994, 9.184129237,
9.501760687, 10.04023077, 9.887214462, 7.947499285, 8.681177515,
10.14076961, 8.990465816, 10.35920222, 8.793812067, 8.962143225,
NA, 10.89773618, 9.646558574, NA, 8.708896587, 8.482467842, 9.490473018,
9.724324492, 9.185016805, 9.367232547, 9.447726264, 10.49359078,
9.086775124, 8.951230645, 8.438922723, 7.612619197, 8.961837755,
NA, 8.473436422, 9.487274967, 8.839257463, 8.019280063, 8.829296324,
9.089621228, 12.66471665, NA, 7.93418751, 8.442549778, 12.43150655,
12.78812747, 9.499177641, 8.88329767, 12.06733547, 8.694287059,
8.733657869, 8.976294071, 11.61797642, NA, 9.223855496, 12.14555242,
9.177782834, 10.50860256, 8.830982089, 9.338875366, 11.10966871,
9.009297476, 9.114841643, 9.145197506, 7.508668256, 8.49838577,
11.70012856, 8.859038138, 9.984367135, 11.18147471, 8.504456058,
9.30440283, 8.491741245, 9.154016228, 7.969788358, 8.890420803,
9.391405036, 8.023003384, 12.06142165, 10.0134321, 7.829115845,
8.619827639, 7.965320738, 9.718533292, 9.642541995, 9.221551363,
9.638749044, 8.728496275, 7.882667305, 8.059467865, 10.88596514,
11.52200146, 8.465388516, 10.89040717, 8.652714649, 8.570009902,
9.575021118, 10.20114206, 8.030898045, 9.325947744, 9.383493864,
NA, 10.98718012, 13.58808295, 9.987675873, 11.59305101, 8.559274188,
10.87432015, 9.530456451, NA, 13.39915598, 14.50068995, 11.4377845,
9.874845508, 8.419345084, 9.833591752, 8.734194935, NA, 8.751516192,
10.74365351, 10.94957982, 11.43931675, 9.26461008, 10.88196331,
10.01986719, 8.521178027, 8.346310841, 9.116175981, 12.55888826,
11.55922318, 11.62731629, 9.974676715, 8.659476016, 9.714302784,
11.69627731, 9.404085345, 8.417580572, 10.26841052, 8.0505316,
14.56194307, 8.496000239, 8.36501204, 9.105109509)), .Names = c("S",
"N"), class = "data.frame", row.names = c(NA, -158L))
And the new data set to which I would like to predict S values:
structure(list(N = c(7.01, 8.02, 9.82, 7.83, 7.49, 8.41, 7.92,
9.7, 7.097, 8, 8.29, 8.34, 7.71, 7.87, 8.782, 8.17, 7.86, 7.665,
7.715, 10.6, 8.06, 7.53, 8.75, 8.29, 7.89, 8.94, 9.58, 9.26,
9.91, 11.6, 9.666, 10.96, 8.809, 9.142, 7.193, 8.616, 9.035,
9.123, 8.102, 8.137, 8.966, 8.333, 6.678, 8.856, 10.96, 8.401,
9.729, 8.755, 8.199, 9.004, 7.94, 8.84, 8.55, 8.26, 7.93, 9.03,
10.3, 10.1, 9.23, 8.41, 7.595, 7.351, 7.251, 8.606, 9.35, 7.786,
7.445, 9.441, 8.844, 8.411, 9.086, 8.609, 7.975, 7.203, 11.88,
6.786, 8.36, 11.1, 11.5, 11.57, 8.755, 12.64, 7.07, 10.58, 8.47,
8.13, 8.45, 9.21, 9.36, 10, 10.4, 12.5, 10.1, 10.2, 9.54, 7.78,
9.12, 8.41, 8.94, 9.22, 12.3, 9.75, 9.13, 10.4, 8.22, 8.4, 10.2,
9.95, 11.1, 10.6, 9.84, 10.1, 12.7, 8.2, 8.55, 11.6, 10.5, 8.09,
9.42, 11.2, 12.3, 7.776, 7.007, 7.306, 7.475, 7.469, 9.593)), .Names = "N",
class = "data.frame", row.names = c(NA,
-127L))
dput(data)
anddput(newdata)
? – RLave