2
votes

It seems like some dplyr functions, including mutate_if, mutate_all, mutate_at etc coerce data.table inputs to data.frame. That seems like strange behaviour, even though it is documented in ?mutate_all (Under 'Value', it says 'data.frame' - but it does not coerce tibbles to data.frames.)

require(dplyr)
require(data.table)
data("iris")
dt <- as.data.table(iris)
class(dt)
#[1] "data.table" "data.frame"
class(mutate_if(dt, is.numeric, as.numeric))
#[1] "data.frame"

However, this does not happen with tibbles:

tb <- as_tibble(iris)
class(tb)
#[1] "tbl_df"     "tbl"        "data.frame"
class(mutate_if(tb, is.numeric, as.numeric))
#[1] "tbl_df"     "tbl"        "data.frame"

Is there some way to maintain the data.table, or do i need to coerce with as.data.table every time I use one of the scoped mutate functions?

3
According to the documentation, dplyr functions try to return the same type of object as the input. Since data.table is not a base class, it returns data.frameRohit
tbl_df is not a base class either.BjaRule
dplyr is meant to work with tibbles.Rohit
I've never tested this, but perhaps check this packageDavid Arenburg
Thanks @DavidArenburg. The dtplyr package seems to need a little bit of tender, loving care - i tried installing it and retrying my reprex, but all I get are errors. I actually thought data.table as a backend was built into dplyr.BjaRule

3 Answers

2
votes

If you'd like to try an alternative, I recently released the table.express package, which uses many dplyr and custom verbs to build data.table expressions.

The linked vignette provides detailed explanations, but some examples:

library(data.table)
library(table.express)

data("iris")
DT <- as.data.table(iris)

# mutate_all (modification by reference does not print)
DT %>%
  mutate_sd(everything(), as.integer)

# mutate_if
DT %>%
  mutate_sd(~ is.numeric(.x), as.integer)

# mutate_at
DT %>%
  mutate_sd(contains("."), ~ .x * 1.5)

# transmute_all
DT %>%
  transmute_sd(everything(), as.integer)

# transmute_if
DT %>%
  transmute_sd(~ is.numeric(.x), as.integer)

# transmute_at
DT %>%
  transmute_sd(contains("."), as.integer)

Do note that mutate_sd modifies by reference by default, so re-define DT between examples if you like.

Also, as of version 0.3.0, you won't be able to load both table.express and dtplyr at the same time, since they define the same data.table methods for many dplyr generics.

1
votes

There may be no satisfying answer to your question, but these wrapper functions would make it such that you wouldn't have to convert back to a data table every time.

And if you didn't want to include these in each script or project, and you didn't want to put them in your .Rprofile, you could even make an itty-bitty package out of them. It's surprisingly easy.

mutate_all <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_all(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_all(...)
  }
}
mutate_if <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_if(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_if(...)
  }
}
mutate_at <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_at(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_at(...)
  }
}
transmute_all <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% transmute_all(...) %>% as.data.table()
  } else {
    .tbl %>% transmute_all(...)
  }
}
transmute_if <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% transmute_if(...) %>% as.data.table()
  } else {
    .tbl %>% transmute_if(...)
  }
}
transmute_at <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% transmute_at(...) %>% as.data.table()
  } else {
    .tbl %>% transmute_at(...)
  }
}
0
votes

Have you tried using

df %>%
mutate_if(yourmutate) %>%
data.table()

Your frame will be both data.table and data.frame.

Following your example:

require(dplyr)
require(data.table)
data("iris")
dt <- as.data.table(iris)
class(dt)
#
dt <- mutate_if(dt, is.numeric, as.numeric) %>% data.table()
class(dt)