1
votes

I have been trying to use dplry for some rather involved data manipulation and have come across this issue for which I cannot find an answer. This is the first question I've asked on StackOverflow. It may be more of a grouped data.table issue rather than dplyr but here goes.

data(iris)
df <- iris %>% group_by(Species) %>% 
  do((function(x) {
    print(names(x))
    print(class(x))
    if(x$Species[1] == 'setosa') x$Petal.Length <- x$Petal.Length+1
    return(x)
  })(.))

Here I can still access the grouping variable because the data is left as a data.frame and the subgroup inside the do is also a data.frame. Obviously the whole x$Species column will be the same value within the do and so this work around presents itself. It seems to me like this 'Current Group' might be a useful value to be able to access. When converting to a data.table however:

dt <- iris %>% tbl_dt() %>% 
  mutate(Species2 = Species) %>%
  group_by(Species) %>% 
  do((function(x) {
    print(names(x))
    print(class(x))
    print( attr(x, 'vars'))
    print(groups(x))
    if(x$Species2[1] == 'setosa') x$Petal.Length <-x$Petal.Length + 1
    return(x)
  })(.)) %>%

the subgroup x is a grouped data.table and the grouping variable is dropped from the subgroup. I've included a copy of the grouping column and reference that copy from within the do, but I feel like there should/could be a more elegant way to refer to the specific group that the do is currently working with.

1

1 Answers

2
votes

In data.table the grouping variable is kept as an atomic value (not a vector), and the rest of the data for each group is kept in a data.table called .SD. I'm not quite sure what you're trying to do, but here are some examples that might get you going:

library(data.table)
data(iris)
setDT(iris)  # convert to data.table in place

iris[, if (Species == 'setosa') {
         Petal.Length + 1
       } else {
         Petal.Length
       }
     , by = Species]

# modify in place
iris[Species == 'setosa', Petal.Length := Petal.Length + 1]

# more complicated modification - modify the first Petal.Width by Species
iris[iris[, .I[1], by = Species]$V1, Petal.Width := Petal.Width + 4]