I have a dataframe with 6 variables:
Depr is a factor with 6 levels ("0", "1", "2", "3", "4", "5")
Sex is a factor with 3 levels ("Both sexes", "Female", "Male")
Age is a factor with 19 levels ("00-04", "05-09", "10-14", "15-19", "20-24", "25- 29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85+","Total")
GL is a factor (geographical level) with 5 levels ("HPE","KFLA","LGL","ON","Regional")
YR is an integer (year), there are only two - 2011 and 2016 (census years)
And Pop is population count, an integer.
The dataframe is set up in long format where I have population counts for all factor combinations for each of the two years.
Depr Sex Age GL YR Pop
0 Both sexes 00-04 ON 2011 395
0 Both sexes 00-04 ON 2016 5550
...
1 Both sexes 00-04 ON 2011 495
1 Both sexes 00-04 ON 2016 3923
I want to interpolate for the years in between 2011 and 2016 (2012, 2013, 2014, 2015) for each row in my dataframe so that I get something like this:
Depr Sex Age GL YR Pop
0 Both sexes 00-04 ON 2011 395
0 Both sexes 00-04 ON 2012 456
0 Both sexes 00-04 ON 2013 689
0 Both sexes 00-04 ON 2014 2354
0 Both sexes 00-04 ON 2015 3446
0 Both sexes 00-04 ON 2016 5550
I have set up nested loops and am using approx
to do the linear interpolation.
#create an empty dataframe to combine the results
fdepr <- data.frame(Depr = factor (levels = c("0", "1", "2", "3", "4", "5")),
Sex = factor(levels = c("Both sexes", "Female", "Male")),
Age = factor (levels = c("00-04", "05-09", "10-14",
"15-19", "20-24", "25-29", "30-34", "35-39", "40-44",
"45-49","50-54", "55-59", "60-64", "65-69", "70-74", "75-
79", "80-84", "85+","Total")),
GL = factor(levels = c("HPE","KFLA","LGL","ON","Regional")),
YR = integer(),
Pop = integer())
#loops to subset Pop by grouping categories (depr is my original df)
for (i in unique(depr$Depr))
{
for (j in unique(depr$Sex))
{
for (k in unique(depr$Age))
{
for (l in unique(depr$GL)) {
temp <- subset(depr, subset=(Depr==i & Sex==j & Age==k & GL == l),select = c(YR, Pop))
x <- temp$YR
y <- temp$Pop
t <- c(2011,2012,2013,2014,2015,2016)
points <- approx(x,y, method = 'linear', xout=t)
results <- data.frame(Depr=rep(i,6), Sex=rep(j,6), Age=rep(k,6), GL= rep(l,6), YR = points$x, Pop = points$y)
fdepr <- rbind (fdepr,results)
}
}}}
It seems to go through and do the first round fine and populate results
and fdepr
as expected, but then I get
Error in approx(x, y, method = "linear", xout = t) :
need at least two non-NA values to interpolate
temp
is empty and so are x
and y
. I'm not sure if it's something in the way fdepr
is defined or if it's the nested loops that are the problem...
I'm not a data scientist so complex logic and programming is not intuitive - any insight is appreciated
depr
or some other example input, as well as the desired output? - IceCreamToucan