2
votes

I am trying to use a data.frame twice in a dplyr chain. Here is a simple example that gives an error

df <- data.frame(Value=1:10,Type=rep(c("A","B"),5))

df %>% 
  group_by(Type) %>% 
  summarize(X=n())  %>% 
  mutate(df %>%filter(Value>2) %>%  
  group_by(Type) %>%  
  summarize(Y=sum(Value)))

Error: cannot handle

So the idea is that first a data.frame is created with two columns Value which is just some data and Type which indicates which group the value is from.

I then try to use summarize to get the number of objects in each group, and then mutate, using the object again to get the sum of the values, after the data has been filtered. However I get the Error: cannot handle. Any ideas what is happening here?

Desired Output:

Type X Y
  A  5 24
  B  5 28
2
@DavidArenburg Desired output added. At least dplyr is being honest when it is overwhelmed.John Paul

2 Answers

6
votes

You could try the following

df %>% 
  group_by(Type) %>% 
  summarise(X = n(), Y = sum(Value[Value > 2]))

# Source: local data frame [2 x 3]
# 
#   Type X  Y
# 1    A 5 24
# 2    B 5 28

The idea is to filter only Value by the desired condition, instead the whole data set


And a bonus solution

library(data.table)
setDT(df)[, .(X = .N, Y = sum(Value[Value > 2])), by = Type]
#    Type X  Y
# 1:    A 5 24
# 2:    B 5 28

Was going to suggest that to @nongkrong but he deleted, with base R we could also do

aggregate(Value ~ Type, df, function(x) c(length(x), sum(x[x>2])))
#   Type Value.1 Value.2
# 1    A       5      24
# 2    B       5      28
3
votes

This is also pretty easy to do with ifelse()

df %>% group_by(Type) %>% summarize(X=n(),y=sum( ifelse(Value>2, Value, 0 )))

outputs:

Source: local data frame [2 x 3]

  Type X  y
1    A 5 24
2    B 5 28