2
votes

I have about 6000 data frames with data like so:

           over10 over20 over50 over100
2014-01-02      1      1      0       0

Each data frame has hundreds of rows.

I need to sum the four columns across all data frames, by index (the date). The result would be one row per date which contains the summed values. So after 6000 data frames, the result might be:

           over10 over20 over50 over100
2014-01-02   3121   2551   1670     985

This would continue with one row per date.

I can easily put the 6000 data frames into a list, or I can rbind into a single big data frame. I have tried various approaches but can't quite get there.

2
So is the date value a rowname or an actual column?thelatemail
@thelatemail Row name, these are natively xtsctrlbrk
I should also mention the nrows of the data frames are not all equalctrlbrk

2 Answers

3
votes

A dplyr approach would be:

Create Sample-Data

set.seed(42)
all <- data.frame(date=sample(seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = "quarter"),
                              100,replace=TRUE),
                over10 = rbinom(100,50,0.5),
                over20 = rbinom(100,50,0.5),
                over50 = rbinom(100,50,0.5),
                over100 = rbinom(100,50,0.5))

group <- sample(1:5,100,replace=TRUE)
all_split <- split(all, group)

So all_split is a list containing all your data.frames. This is necessary to combine them with dplyr::rbind_all (see later). Once they are combined you can easily use dplyr to summarize the data: In your case sum them up.

Actual code

require(dplyr)

dat <- rbind_all(all_split) # This is a big data.frame with all data
dat %>%
  group_by(date) %>%
  summarise_each(funs(sum))

Result:

Source: local data frame [13 x 5]

         date over10 over20 over50 over100
1  2000-01-01    220    187    202     205
2  2000-04-01    175    164    173     159
3  2000-07-01    159    171    185     185
4  2000-10-01    168    176    154     182
5  2001-01-01    145    138    150     160
1
votes

Using the dataset provided by @Floo0

aggregate(all[,2:5], by=list(all$date), sum)

or

aggregate(all[,names(all)!="date"], by=list(all$date), sum

      Group.1 over10 over20 over50 over100
1  2000-01-01    220    187    202     205
2  2000-04-01    175    164    173     159
3  2000-07-01    159    171    185     185
4  2000-10-01    168    176    154     182
5  2001-01-01    145    138    150     160
6  2001-04-01    179    164    152     178
7  2001-07-01    207    200    207     191
8  2001-10-01    133    131    142     112
9  2002-01-01    262    242    255     236
10 2002-04-01    242    251    262     251
11 2002-07-01    177    166    168     169
12 2002-10-01    156    147    163     125
13 2003-01-01    286    299    293     302