As I am new to R, I am struggling to find a solution to avoid using dplyr, as I know data.table usually has a better performance. To avoid having to convert each data.frame to data.table after using mutate_all, etc. I have found a workaround in another post (note: I have no reputation points to comment and ask directly there). However this throws the following error:
"evaluation nested too deeply: infinite recursion / options(expressions=)? "
I think this is because I have functions embedded into the mutate_all, mutate_at, etc. and I do not know enough about functions to try to change the wrappers. Any ideas on how I can adapt the wrapper functions?
I have several transformations like this using mutate_all, mutate_at with different funs. We usually import data from Excel or CSV.
The wrapper functions I used where a solution to a similar problem here answered by user:BenjaminWolfe. I think the main difference is that they did not have a funs within the mutate_if :mutate_if, summarize_at etc coerce data.table to data.frame
This seems to be an great workaround as I would only need to include these wrapper functions in the code.
Below is an example of the data and one of the code blocks:
The first one is without the wrapper function workaround
library(dplyr)
library(data.table)
# Example of data to clean without wrapper functions
========
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))
# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")
# Change data types where they should be numbers and dates
DT <- DT %>%
mutate_at(vars(DateNumberColumns),
funs(as.numeric)) %>%
mutate_at(vars(DateColumns),
# due to an error in Excel's dates, the origin that gives the correct dates is as below
funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)
This one is with the wrapper function workaround
# Example of data to clean with wrapper functions
========
# take out # to clean environment:
# rm(list=ls())
# data table example
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))
# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")
# wrapper function from https://stackguides.com/questions/56145140/mutate-if-summarize-at-etc-coerce-data-table-to-data-frame
mutate_at <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% mutate_at(...) %>% as.data.table()
} else {
.tbl %>% mutate_at(...)
}
}
DT <- DT %>%
mutate_at(vars(DateNumberColumns),
funs(as.numeric)) %>%
mutate_at(vars(DateColumns),
# due to an error in Excel's dates, the origin that gives the correct dates is as below
funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)
I expected the output to be:
date numbers
1: 2018-01-31 1e+09
2: 2018-02-01 1e+09
3: 2018-02-02 1e+09
4: 2018-02-03 1e+09
5: 2018-02-04 1e+09
6: 2018-02-05 1e+09
7: 2018-02-06 1e+09
8: 2018-02-07 1e+09
9: 2018-02-08 1e+09
10: 2018-02-09 1e+09
is.data.table(DT)
[1] TRUE
But the actual output is:
DT <- DT %>%
+ mutate_at(vars(DateNumberColumns),
+ funs(as.numeric)) %>%
+ mutate_at(vars(DateColumns),
+ # due to an error in Excel's dates, the origin that gives the correct dates is as below
+ funs(as.Date(., origin = "1899-12-30")))
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
> DT
date numbers
1: 43131 1000000000
2: 43132 1000000001
3: 43133 1000000002
4: 43134 1000000000
5: 43135 1000000001
6: 43136 1000000002
7: 43137 1000000000
8: 43138 1000000001
9: 43139 1000000002
10: 43140 1000000000
> is.data.table(DT)
[1] TRUE
data.table
way of doingmutate_at
is generally to uselapply
,.SD
, and.SDcols
, perhaps the new vignette can help rdatatable.gitlab.io/data.table/library/data.table/doc/… – MichaelChirico