1
votes

I have this data.frame:

df_test = structure(list(`MAE %` = c(-0.0647202646339709, -0.126867775585001, 
-1.81159420289855, -1.03092783505155, -2.0375491194877, -0.160783192796913, 
-0.585827216261999, -0.052988554472234, -0.703351261894911, -0.902996305924203, 
-0.767676767676768, -0.0101091791346543, -0.0134480903711673, 
-0.229357798165138, -0.176407935028625, -0.627062706270627, -1.75706139769261, 
-1.23024009524439, -0.257391763463569, -0.878347259688137, -0.123613523987705, 
-1.65711947626841, -2.11718534838887, -0.256285931980328, -1.87152777777778, 
-0.0552333609500138, -0.943983402489627, -0.541095890410959, 
-0.118607409474639, -0.840453845076341), Profit = c(7260, 2160, 
-7080, 3600, -8700, 6300, -540, 10680, -1880, -3560, -720, 5400, 
5280, 1800, 11040, -240, -2320, 2520, 10300, -2520, 8400, -9240, 
-5190, 7350, -6790, 3600, -3240, 8640, 7150, -2400)), .Names = c("MAE %", 
"Profit"), row.names = c(NA, 30L), class = "data.frame")

Now i want some summary statistics like:

df_test %>% 
    group_by(win.g = Profit > 0) %>%
    summarise(GroupCnt  = n(),
              TopMAE    = filter(`MAE %` > -1) %>% sum(Profit),
              BottomMAE = filter(`MAE %` <= -1) %>% sum(Profit))

So we group data if Profit > 0 or <= 0. Then i want sum() of Profit for rows with MAE % <= -1 and for MAE % > -1. Grouping must be used for TopMAE, BottomMAE calculation.

Expected result is like:

#  win.g CroupCnt TopMAE BottomMAE
#1 FALSE       14 -15100    -39320
#2  TRUE       16  95360      6120

But my R code does not working. I have an error:

Error: no applicable method for 'filter_' applied to an object of class "logical"

I have changed my code according to error:

df_test %>% 
    group_by(win.g = Profit > 0) %>%
    summarise(UnderStop = n(),
              TopMAE    = filter(., `MAE %` > -1) %>% sum(Profit),
              BottomMAE = filter(., `MAE %` <= -1) %>% sum(Profit))

But the result is none. I have an error again:

Error: incorrect length (14), expecting: 16

I tried to understand grouping behavior and how to use piping inside summarise after grouping, but i did not success. Spend whole day on it.

HOW can i get my expected result table? Please help me to understand dplyr logic when grouping and calculating some functions on that groups.

2

2 Answers

3
votes

Is this what you are looking for? (Only asking because I get different results thatn your output),

df_test %>% 
       group_by(win.g = Profit > 0) %>% 
       summarise(CroupCnt = n(), TopMAE = sum(Profit[`MAE %` > -1]), 
                                 BottomMAE = sum(Profit[`MAE %` <= -1]))

#Source: local data frame [2 x 4]

#  win.g CroupCnt TopMAE BottomMAE
#  (lgl)    (int)  (dbl)     (dbl)
#1 FALSE       14 -15100    -39320
#2  TRUE       16  95360      6120
2
votes

Personally, I prefer to work a problem like this with the recognition that you are performing your grouped operations on two dimensions, but your code only uses one dimension. Here's an example performing the same work over two dimensions. It's a bit more code than @Sotos provided, but provides the same result he got.

library(dplyr)
library(tidyr)

df_test %>%
  #* Group on two dimensions
  group_by(win.g = Profit > 0,
           top = ifelse(`MAE %` > -1, "TopMAE", "BottomMAE")) %>%
  summarise(GroupCnt = n(),
            SumProfit = sum(Profit)) %>%
  ungroup() %>%

  #* Collapse the GroupCnt
  group_by(win.g) %>%
  mutate(GroupCnt = sum(GroupCnt)) %>%
  ungroup() %>%

  #* From long to wide
  spread(top, SumProfit)