R loop through columns in list of data frames

Question

Suppose the following data frame (in reality my data frame has thousands of rows):

year<-c(2010,2010,2010,2011,2011,2011,2012,2012,2013,2013)
a1<-rnorm(10)
a2<-rnorm(10)
b1<-rnorm(10)
b2<-rnorm(10)
c1<-rnorm(10)
c2<-rnorm(10)

I used the following code to create a list consisting of multiple data frames, which splits the original data frame into subsets by year.

#split datasets into years
df.list<-split(df, df$year)

#Name of datasets df plus year
dfnames <- str_c("df", names(df.list))
names(df.list)<-dfnames

I want to apply the following loop to all data frames of the list:

#df_target is a new data frame that stores the results and j is the indicator for it:
df_target <- NULL
j <- 1

for(i in seq(2, 7, 2)) {
  df_target[[j]] <- (df[i]*df[i+1])/(sum(df[i+1]))
  j <- j+1
  }
}

The code works fine for one data frame, however, I want to split the data frame into multiple data frames grouped by year and then loop through the columns.

Thus, I use the following function to apply the loop mentioned above to all data frames from the list:

df_target <- NULL
j <- 1

fnc <- function(x){
  for(i in seq(2, 7, 2)) {
  df_target[[j]] <- (x[i]*x[i+1])/(sum(x[i+1]))
  j <- j+1
  }
}

sapply(df.list, fnc)

With this code, I don't get any error messages, however both data frames from the list are NULL. What exactly am I doing wrong?

df_target should be a data frame containing columns a_new= (a1a2)/sum(a2), b_new= (b1b2)/sum(b2) and c_new= (c1*c2)/sum(c2) but for each year separately.

df has only 7 columns, but your i index tries to select columns 8 and 10m which doesn't exist, so i cant run your code. Also, can you explain what you're trying yo achieve? you want df_target to be ...? — Ricardo Semião e Castro
@RicardoSemiãoeCastro sorry, I have edited my question. It should work now. — ZayzayR
Splitting the data into multiple small data.frames is rarely a good strategy, better toreshape the data, then use dplyr::group_by and dplyr::mutate — Richard Telford
@RichardTelford can I then use the loop? I do not want to select the columns by names or index because sometimes I have datasets with 90 columns. Therefore, I was using the loop and would like to apply the loop to the list of data frames. — ZayzayR

Ricardo Semião e Castro Ricardo Semião e Castro · Accepted Answer · 2020-10-26T13:31:42

You need to define j and df_target inside the function, and set what should it return (as it is now, it makes the calculation of df_target, but doesn't return's it):

fnc <- function(x){
  df_target <- NULL
  j <- 1
  for(i in seq(2, 7, 2)) {
  df_target[[j]] <- (x[i]*x[i+1])/(sum(x[i+1]))
  j <- j+1
  }
  return(df_target)
}

But keep in mind that this will output a matrix of lists, as for each element of df.list that sapply will select, you'll be creating a 3 element list of df_target, so the output will look like this in the console:

> sapply(df.list, fnc)
     df2010 df2011 df2012 df2013
[1,] List,1 List,1 List,1 List,1
[2,] List,1 List,1 List,1 List,1
[3,] List,1 List,1 List,1 List,1

But will be this:

To get a cleaner output, we can set df_target to create a data frame with the values from each year:

fnc <- function(x){
  df_target <- as.data.frame(matrix(nrow=nrow(x), ncol=3))
  for(i in seq(2, 7, 2)) {
    df_target[,i/2] <- (x[i]*x[i+1])/(sum(x[i+1]))
  }
return(df_target)}

This returns a df per year, but if we use sapply we'll get a similar output of matrix of lists, so its better to define the function to already loop trough every year:

fnc <- function(y){
  df_target.list <- list()
  k=1
  for(j in y){
    df_target <- as.data.frame(matrix(nrow=nrow(j), ncol=3))
    for(i in seq(2, 7, 2)) {
      df_target[,i/2] <- (j[i]*j[i+1])/(sum(j[i+1]))
    }
    df_target.list[[names(y)[k]]] = df_target
    k=k+1
  }
  return(df_target.list)}

Output:

> fnc(df.list)
$df2010
           V1         V2          V3
1 -0.10971160 0.01688244 -0.16339367
2  0.05440564 0.57554210 -0.06803244
3  0.03185178 0.90598561 -0.68692401

$df2011
           V1           V2         V3
1 -0.43090055  0.007152131  0.3930606
2  0.15050644  0.329092942 -0.1367295
3  0.07336839 -0.423631930 -0.1504056

$df2012
         V1         V2         V3
1 0.5540294  0.4561862 0.09169914
2 0.1153931 -1.1311450 0.81853691

$df2013
          V1        V2        V3
1  0.4322934 0.5286973 0.2136495
2 -0.2412705 0.1316942 0.1455196

R loop through columns in list of data frames

2 Answers