0
votes

I have three data frames, each having 1 column but having different number of rows 100,100,1000 for df1,df2,df3 respectively. I want to do an rbind iteratively and calculate measures like mean repeatedly for the small chunks of data by taking 10% of the data each time. Meaning in the first iteration I need to have 10 rows from df1, 10 from df2 and 100 from df3 and for this set i need to get a mean and the process should continue 10 times. And I need to plot the iterations chunks over time showing the mean in y-axis over iterations and get an overall mean with this procedure. Any suggestions?

df1<- data.frame(A=c(1:100))
df2<- data.frame(A=c(1:100))
df3<- data.frame(A=c(1:1000))

library(dplyr)
for i in (1:10)
     { df[i]<- rbind_list(df1,df2,df3)
      mean=mean(df$A)} 
1

1 Answers

3
votes

You're making things complicated by trying to keep separate data frames. Add a "group" column---call it "iteration" if you prefer---and get your data in one data frame:

df1$group = rep(1:10, each = nrow(df1) / 10)
df2$group = rep(1:10, each = nrow(df2) / 10)
df3$group = rep(1:10, each = nrow(df3) / 10)
df = rbind(df1, df2, df3)

means = group_by(df, group) %>% summarize(means = mean(A))
means
#  Source: local data frame [10 x 2]
#
#     group means
#  1      1    43
#  2      2   128
#  3      3   213
#  4      4   298
#  5      5   383
#  6      6   468
#  7      7   553
#  8      8   638
#  9      9   723
# 10     10   808

Your overall mean is mean(df$A). You can plot with with(means, plot(group, means)).

Edits:

If the groups don't come out exactly, here's how I'd assign the group column. Make sure your dplyr is up-to-date, this uses the the .id argument of bind_rows() which was new this month in version 0.4.3.

library(dplyr)
# dplyr > 0.4.3

df = bind_rows(df1, df2, df3, .id = "id")
df = df %>% group_by(id) %>%
    mutate(group = (0:(n() - 1)) %/% (n() / 10) + 1)

The id column tells you which data frame the row came from, and the group column splits it into 10 groups. The rest of the code from above should work just fine.