1
votes

I'm a newcommer to dplyr and have following question. My has data.frame one column serving as a grouping variable. Some rows don't belong to a group, the grouping column being NA.

I need to add some columns to the data.frame using the dplyr function mutate. I'd prefer that dplyr ignores all rows where the grouping column equals to NA. I'll illustrate with an example:

library(dplyr)

set.seed(2)

# Setting up some dummy data
df <- data.frame(
  Group = factor(c(rep("A",3),rep(NA,3),rep("B",5),rep(NA,2))),
  Value = abs(as.integer(rnorm(13)*10))
)

# Using mutate to calculate differences between values within the rows of a group
df <- df %>%
  group_by(Group) %>%
  mutate(Diff = Value-lead(Value))

df
# Source: local data frame [13 x 3]
# Groups: Group [3]
# 
#     Group Value  Diff
#    (fctr) (int) (int)
# 1       A     8     7
# 2       A     1   -14
# 3       A    15    NA
# 4      NA    11    11
# 5      NA     0    -1
# 6      NA     1    -8
# 7       B     7     5
# 8       B     2   -17
# 9       B    19    18
# 10      B     1    -3
# 11      B     4    NA
# 12     NA     9     6
# 13     NA     3    NA

Calculating the differences between rows without a group makes no sense and is corrupting the data. I need to remove these rows and have done so like this:

df$Diff[is.na(df$Group)]  <- NA

Is there a way to include the above command into the dplyr-chain using %>% ? Somewhere in the lines of:

df <- df %>%
  group_by(Group) %>%
  mutate(Diff = Value-lead(Value)) %>%
  filter(!is.na(Group))

But where the rows without a group are not removed all together? Or even better, is there a way to make dplyr ignore rows without a group?

There desired outcome would be:

# Source: local data frame [13 x 3]
# Groups: Group [3]
# 
#     Group Value  Diff
#    (fctr) (int) (int)
# 1       A     8     7
# 2       A     1   -14
# 3       A    15    NA
# 4      NA    11    NA
# 5      NA     0    NA
# 6      NA     1    NA
# 7       B     7     5
# 8       B     2   -17
# 9       B    19    18
# 10      B     1    -3
# 11      B     4    NA
# 12     NA     9    NA
# 13     NA     3    NA
1

1 Answers

7
votes

Simply use an iflelse condition for the variable that you are trying to create:

library(dplyr)
set.seed(2)

df = data.frame(
  Group = factor(c(rep("A",3), rep(NA,3), rep("B",5), rep(NA,2))),
  Value = abs(as.integer(rnorm(13)*10))
) %>% 
  group_by(Group) %>%
  mutate(Diff = ifelse(is.na(Group), as.integer(NA), Value-lead(Value)))