Suppose the following data frame (in reality my data frame has thousands of rows):
year<-c(2010,2010,2010,2011,2011,2011,2012,2012,2013,2013)
a1<-rnorm(10)
a2<-rnorm(10)
b1<-rnorm(10)
b2<-rnorm(10)
c1<-rnorm(10)
c2<-rnorm(10)
I used the following code to create a list consisting of multiple data frames, which splits the original data frame into subsets by year.
#split datasets into years
df.list<-split(df, df$year)
#Name of datasets df plus year
dfnames <- str_c("df", names(df.list))
names(df.list)<-dfnames
I want to apply the following loop to all data frames of the list:
#df_target is a new data frame that stores the results and j is the indicator for it:
df_target <- NULL
j <- 1
for(i in seq(2, 7, 2)) {
df_target[[j]] <- (df[i]*df[i+1])/(sum(df[i+1]))
j <- j+1
}
}
The code works fine for one data frame, however, I want to split the data frame into multiple data frames grouped by year and then loop through the columns.
Thus, I use the following function to apply the loop mentioned above to all data frames from the list:
df_target <- NULL
j <- 1
fnc <- function(x){
for(i in seq(2, 7, 2)) {
df_target[[j]] <- (x[i]*x[i+1])/(sum(x[i+1]))
j <- j+1
}
}
sapply(df.list, fnc)
With this code, I don't get any error messages, however both data frames from the list are NULL. What exactly am I doing wrong?
df_target should be a data frame containing columns a_new= (a1a2)/sum(a2), b_new= (b1b2)/sum(b2) and c_new= (c1*c2)/sum(c2) but for each year separately.
df
has only 7 columns, but youri
index tries to select columns 8 and 10m which doesn't exist, so i cant run your code. Also, can you explain what you're trying yo achieve? you wantdf_target
to be ...? – Ricardo Semião e Castrodplyr::group_by
anddplyr::mutate
– Richard Telford