I know you explicitly expressed a desire to avoid nested ifelse statements, so forgive me for using one here. But in my defense, we have a solution that essentially reuses the same ifelse
to create both the Range
and Avg
columns, using an apply function. We also get to take averages quickly with dplyr
.
This solution also assumes you only have Value
and Year
to begin with, since I'm not really sure how you got to the place that you are, e.g., creating the Range
column. So I started from scratch.
First write a function to define the Range
column
library(dplyr)
get_range <- function(number){ #<-- takes in percentile
ans <-
if ( (number >= 1900) & ( number <= 1910) ) {
"1900-1910"
} else if ( (number > 1910 ) & ( number <= 1920) ) {
"1911-1920"
} else if ( (number > 1920 ) & ( number <= 1930) ) {
"1921-1930"
} else if ( (number > 1930 ) & ( number <= 1940) ) {
"1931-1940"
} else if ( (number > 1940 ) & ( number <= 1950) ) {
"1941-1950"
} else if ( (number > 1950 ) & ( number <= 1960) ) {
"1951-1960"
} else if ( (number > 1960 ) & ( number <= 1970) ) {
"1961-1970"
} else if ( (number > 1970 ) & ( number <= 1980) ) {
"1971-1980"
} else if ( (number > 1980 ) & ( number <= 1990) ) {
"1981-1990"
} else if ( (number > 1990 ) & ( number <= 2000) ) {
"1991-2000"
} else {
"NA"
}
return(ans)
}
Then, apply it:
df$Range <- sapply(df$Year, function(x) get_range(x))
Next, use dplyr
to get your aggregate values and store in a separate dataframe.
df_Avg <- df %>%
group_by(Range) %>%
summarise(Avg = mean(Value))
Modify the same function above to take care of the averages
get_avg <- function(number){
ans <-
if ( (number >= 1900) & ( number <= 1910) ) {
df_Avg$Avg[1]
} else if ( (number > 1910 ) & ( number <= 1920) ) {
df_Avg$Avg[2]
} else if ( (number > 1920 ) & ( number <= 1930) ) {
df_Avg$Avg[3]
} else if ( (number > 1930 ) & ( number <= 1940) ) {
df_Avg$Avg[4]
} else if ( (number > 1940 ) & ( number <= 1950) ) {
df_Avg$Avg[5]
} else if ( (number > 1950 ) & ( number <= 1960) ) {
df_Avg$Avg[6]
} else if ( (number > 1960 ) & ( number <= 1970) ) {
df_Avg$Avg[7]
} else if ( (number > 1970 ) & ( number <= 1980) ) {
df_Avg$Avg[8]
} else if ( (number > 1980 ) & ( number <= 1990) ) {
df_Avg$Avg[9]
} else if ( (number > 1990 ) & ( number <= 2000) ) {
df_Avg$Avg[1]
} else {
"NA"
}
return(ans)
}
...and finally, apply the function.
df$Avg <- sapply(df$Year, function(x) get_avg(x))
Should give you this:
> df
Value Year Range Avg
1 12 1906 1900-1910 15.333333
2 15 1909 1900-1910 15.333333
3 7 1911 1911-1920 6.666667
4 22 1950 1941-1950 12.500000
5 4 1917 1911-1920 6.666667
6 9 1917 1911-1920 6.666667
7 19 1902 1900-1910 15.333333
8 1 1921 1921-1930 1.000000
9 2 1931 1931-1940 2.000000
10 3 1941 1941-1950 12.500000
11 4 1951 1951-1960 4.000000
12 5 1961 1961-1970 5.000000
13 6 1971 1971-1980 6.000000
14 7 1981 1981-1990 7.000000
15 8 1991 1991-2000 15.333333
I'm sure you could do this quicker and with less code, but in exchange, this is fairly readable and you could hand it off to your coworker with having to do tons of commenting.