2
votes

I have a table with variables: OrderPostingYear,OrderPostingMonth,ProductsFamily,Sales,QTY. Now I want to create a dataframe to show a table with rows as each ProductFamily (grouping) and columns as each OrderPostingYear&OrderPostingMonth(grouping), values are sum of Sales. How would I do it?

>ProductTable
 OrderPostingYear OrderPostingMonth ProductsFamily Sales QTY
2008               1                 R1            5234   1
2008               1                 R2            223    2
2009               1                 R3            34     1 
2008               2                 R1            1634   3
2010               4                 R3            224    1 

The result should look like:

>PFTable
      2008-1 2008-2 2009-1 2010-4
R1     5234    1634    0     0
R2     223     0       0     0
R3     0       0       34    224

I am thinking to use group_by and summarise_each in dplyr but not successes. need help, please. Thank you!

PFTable<-data.frame(ProductTable%>%
                   group_by(ProductFamily) %>%                                summarise_each(.,funs(sum(SalesVolume,na.rm=TRUE)),group_by(OrderPostingYear,OrderPostingMonth)))
1

1 Answers

1
votes

We can use acast to reshape from 'long' to 'wide' format.

library(reshape2)
acast(ProductTable, ProductsFamily~OrderPostingYear+OrderPostingMonth, 
             value.var='Sales', fill=0) 
#   2008_1 2008_2 2009_1 2010_4
#R1   5234   1634      0      0
#R2    223      0      0      0
#R3      0      0     34    224

If we want to use dplyr/tidyr, then unite the 'OrderPostingYear' and 'OrderPostingMonth', remove the 'QTY' and spread to reshape from 'long' to 'wide'.

library(dplyr)
library(tidyr)
unite(df1, OrderMonth, OrderPostingYear, OrderPostingMonth, sep="-") %>%
               select(-QTY) %>%
               spread(OrderMonth, Sales)