0
votes

As I am new to R, I am struggling to find a solution to avoid using dplyr, as I know data.table usually has a better performance. To avoid having to convert each data.frame to data.table after using mutate_all, etc. I have found a workaround in another post (note: I have no reputation points to comment and ask directly there). However this throws the following error:

"evaluation nested too deeply: infinite recursion / options(expressions=)? "

I think this is because I have functions embedded into the mutate_all, mutate_at, etc. and I do not know enough about functions to try to change the wrappers. Any ideas on how I can adapt the wrapper functions?

I have several transformations like this using mutate_all, mutate_at with different funs. We usually import data from Excel or CSV.

The wrapper functions I used where a solution to a similar problem here answered by user:BenjaminWolfe. I think the main difference is that they did not have a funs within the mutate_if :mutate_if, summarize_at etc coerce data.table to data.frame

This seems to be an great workaround as I would only need to include these wrapper functions in the code.

Below is an example of the data and one of the code blocks:

The first one is without the wrapper function workaround

library(dplyr)
library(data.table)


# Example of data to clean without wrapper functions
========
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))

# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")


# Change data types where they should be numbers and dates
DT <- DT %>%
  mutate_at(vars(DateNumberColumns),
            funs(as.numeric)) %>%
  mutate_at(vars(DateColumns),
            # due to an error in Excel's dates, the origin that gives the correct dates is as below
            funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)

This one is with the wrapper function workaround

# Example of data to clean with wrapper functions
========
# take out # to clean environment:
# rm(list=ls())
# data table example
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))

# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")

# wrapper function from https://stackguides.com/questions/56145140/mutate-if-summarize-at-etc-coerce-data-table-to-data-frame
    mutate_at <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_at(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_at(...)
  }
}

DT <- DT %>%
  mutate_at(vars(DateNumberColumns),
            funs(as.numeric)) %>%
  mutate_at(vars(DateColumns),
            # due to an error in Excel's dates, the origin that gives the correct dates is as below
            funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)

I expected the output to be:

          date numbers
 1: 2018-01-31   1e+09
 2: 2018-02-01   1e+09
 3: 2018-02-02   1e+09
 4: 2018-02-03   1e+09
 5: 2018-02-04   1e+09
 6: 2018-02-05   1e+09
 7: 2018-02-06   1e+09
 8: 2018-02-07   1e+09
 9: 2018-02-08   1e+09
10: 2018-02-09   1e+09

is.data.table(DT)
[1] TRUE

But the actual output is:

 DT <- DT %>%
+   mutate_at(vars(DateNumberColumns),
+             funs(as.numeric)) %>%
+   mutate_at(vars(DateColumns),
+             # due to an error in Excel's dates, the origin that gives the correct dates is as below
+             funs(as.Date(., origin = "1899-12-30")))
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
> DT
     date    numbers
 1: 43131 1000000000
 2: 43132 1000000001
 3: 43133 1000000002
 4: 43134 1000000000
 5: 43135 1000000001
 6: 43136 1000000002
 7: 43137 1000000000
 8: 43138 1000000001
 9: 43139 1000000002
10: 43140 1000000000
> is.data.table(DT)
[1] TRUE
1
data.table way of doing mutate_at is generally to use lapply, .SD, and .SDcols, perhaps the new vignette can help rdatatable.gitlab.io/data.table/library/data.table/doc/…MichaelChirico
Thanks, it works perfectly for the as.numeric transformation. However for the as.Date I am guetting an error "charToDate(x) : character string is not in a standard unambiguous format". Tried several combinations answers in other posts but nothing seems to work. ``` DT[, (DateNumberColumns) := lapply(.SD,as.integer),.SDcols = DateNumberColumns] DT[, (DateColumns) := lapply(.SD,as.Date((DateColumns), origin = "1899-12-30")),.SDcols = DateNumberColumns] ```V Inacio
inspect sort(unique(DT$date)), maybe there's a data quality issueMichaelChirico
All of them are numbers (i.e. the numeric equivalent in excel of the dates between 2018-01-31 and 2018-02-09)V Inacio

1 Answers

0
votes

If you're interested in an alternative, I recently released the table.express package, which could help you in these situations when you want to use syntax similar to dplyr. Your example could be done like this:

library(data.table)
library(table.express)

DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))

# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")

DT <- DT %>%
  start_expr %>%
  mutate_sd(as.numeric, .SDcols = DateNumberColumns) %>%
  mutate_sd(as.Date, origin = "1899-12-30", .SDcols = DateColumns) %>%
  end_expr %T>%
  print
          date numbers
 1: 2018-01-31   1e+09
 2: 2018-02-01   1e+09
 3: 2018-02-02   1e+09
 4: 2018-02-03   1e+09
 5: 2018-02-04   1e+09
 6: 2018-02-05   1e+09
 7: 2018-02-06   1e+09
 8: 2018-02-07   1e+09
 9: 2018-02-08   1e+09
10: 2018-02-09   1e+09