I would like to estimate the length of individuals based on their age and sex, using OLS in R. To that purpose I built the following model: m1 <- lm(length ~ age + sex, data = data.frame)
. Next, I created some simple residual plots by running:
op <- par(mfrow = c(2,2))
plot(resid(m1) ~ fitted(m1))
plot(resid(m1) ~ data.frame$age)
plot(resid(m1) ~ data.frame$sex)
qqnorm(resid(m1)); qqline(resid(m1))
par(op)
yielding this plot:
Strange enough, the fitted values do not seem to have the range [165,180]
, but rather [165,170] ∪ [175,180]
(top left plot). I do not understand why this is happening.
Below some sample data producing the plots above:
structure(list(length = c(173, 170, 172, 160, 162.5, 180, 179.5,
175, 168, 186.5, 163.5, 170.5, 160, 175.5, 186.5, 176.5, 168,
180.5, 179, 167, 183, 188.5, 176, 165, 170, 171, 176, 172, 187,
189, 180, 175.5, 162.5, 187, 164, 177, 170.5, 159.5, 161.5, 167,
178, 180.5, 168.5, 162, 171, 173, 171.5, 174.5, 177, 158, 175,
170, 183.5, 166, 174.5, 174, 176, 165, 163.5, 171.5, 161, 173,
165, 186, 171, 164.5, 182.5, 156.5, 156, 175, 168.5, 195, 164,
167.5, 168, 165.5, 172.5, 167, 175, 190, 170.5, 166, 155, 179.5,
175, 185, 174, 158.5, 172.5, 172.5, 173, 177, 161.5, 173.5, 159,
181, 176, 181.5, 167.5, 170.5), age = c(31.0965092402464, 67.7481177275838,
60.9062286105407, 54.776180698152, 57.8316221765914, 42.0287474332649,
47.1786447638604, 51.315537303217, 68.0876112251882, 32.3613963039014,
52.1259411362081, 50.7652292950034, 53.6947296372348, 64.6242299794661,
66.9733059548255, 66.8829568788501, 63.668720054757, 73.533196440794,
57.7659137577002, 43.7262149212868, 51.2416153319644, 30.7953456536619,
73.0403832991102, 52.2984257357974, 46.2614647501711, 35.7618069815195,
74.0670773442847, 35.6878850102669, 43.3894592744695, 65.0458590006845,
55.9671457905544, 71.2306639288159, 58.5653661875428, 40.0520191649555,
39.9698836413415, 44.0109514031485, 34.4722792607803, 47.5400410677618,
51.8822724161533, 46.9596167008898, 39.0143737166324, 49.0349075975359,
39.3812457221081, 48.2518822724162, 37.0376454483231, 30.425735797399,
31.5838466803559, 74.9459274469541, 46.3353867214237, 56.0602327173169,
54.4476386036961, 58.4120465434634, 47.64681724846, 39.047227926078,
45.2183436002738, 48.0246406570842, 41.5140314852841, 61.0732375085558,
52.2600958247776, 62.9760438056126, 70.715947980835, 70.5735797399042,
40.2436687200548, 35.0198494182067, 41.1772758384668, 57.2210814510609,
64.2710472279261, 59.6221765913758, 63.0088980150582, 48.5366187542779,
30.0369609856263, 48.8898015058179, 49.7741273100616, 54.7624914442163,
61.284052019165, 37.0102669404517, 58.4695414099932, 55.3483915126626,
39.4579055441478, 49.3333333333333, 37.9712525667351, 57.388090349076,
70.8199863107461, 37.0212183436003, 51.3675564681725, 48.3860369609856,
35.895961670089, 39.5208761122519, 37.4209445585216, 46.8692676249144,
65.3826146475017, 51.9425051334702, 33.2594113620808, 55.1156741957563,
33.9493497604381, 33.2895277207392, 42.0369609856263, 29.4976043805613,
54.9514031485284, 36.2327173169062), sex = structure(c(1L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L,
1L, 2L), .Label = c("0", "1"), class = "factor")), row.names = c(1L,
2L, 3L, 4L, 5L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 29L, 30L, 32L,
33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L,
47L, 48L, 49L, 50L, 51L, 53L, 54L, 56L, 57L, 58L, 59L, 60L, 62L,
63L, 64L, 65L, 66L, 67L, 68L, 70L, 73L, 74L, 75L, 76L, 77L, 78L,
79L, 81L, 82L, 85L, 86L, 87L, 88L, 90L, 92L, 94L, 95L, 96L, 97L,
98L, 99L, 100L, 102L, 103L, 104L, 105L, 106L, 107L, 109L, 110L,
111L, 112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L), class = "data.frame")
Does anyone know the flaw?
age:sex
in your model. – Rolandage:sex
did not solve the problem. – Dionlibrary(ggplot2); ggplot(DF, aes(x = age, y = length, color = factor(sex))) + geom_point() + geom_smooth(method = "lm")
There is no "flaw", the fitted lines just predict a perfect separation of length in the given age range. – Roland