0
votes

I am trying to convert multiple date columns (with different format) in data.table. There are few method already available. One of the link Efficiently convert a date column in data.table. I was trying using mapply. But got the following error:

Error in [.data.table(df, , :=((paste0(dtVar, "")), mapply(function(x, : Supplied 12 items to be assigned to 6 items of column 'X1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

library(data.table)

# sample data
df <- data.table(
  X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", "1996-01-11"), 
  X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", "09/01/1996"), 
  stringsAsFactors = FALSE)


# convert date columns
dtVar <- c("X1", "X2")
inDtFmt <- c("%Y-%m-%d","%d/%m/%Y")

df[,(dtVar) := mapply(function(x,y){strptime(df[[x]], format = y)}, dtVar, inDtFmt)]

## Further investigation
mm <- mapply(function(x,y){strptime(df[[x]], format = y)}, dtVar, inDtFmt)

str(mm)
List of 2
# $ X1: POSIXlt[1:6], format: "1996-01-04" "1996-01-05" "1996-01-08" "1996-01-09" ...
# $ X2: POSIXlt[1:6], format: "1996-01-02" "1996-01-03" "1996-01-04" "1996-01-05" ...

Can someone tell me why I am getting this error?

4

4 Answers

2
votes

mapply usually tries to simplify result into a vector, you should use Map instead. strptime returns object of class POSIXlt, here you need only date so use as.Date.

Moreover, if you use lubridate::parse_date_time you can do this with lapply.

library(data.table)

df[, (dtVar) := lapply(.SD, lubridate::parse_date_time, inDtFmt), .SDcols = dtVar]
df

#           X1         X2
#1: 1996-01-04 1996-01-02
#2: 1996-01-05 1996-01-03
#3: 1996-01-08 1996-01-04
#4: 1996-01-09 1996-01-05
#5: 1996-01-10 1996-01-08
#6: 1996-01-11 1996-01-09
1
votes

You can do using as.IDate:

df <- df[,as.list(Map(function(x,y){as.IDate(.SD[[x]], format = y)}, dtVar, inDtFmt))]
print(df)

           X1         X2
1: 1996-01-04 1996-01-02
2: 1996-01-05 1996-01-03
3: 1996-01-08 1996-01-04
4: 1996-01-09 1996-01-05
5: 1996-01-10 1996-01-08
6: 1996-01-11 1996-01-09
1
votes

We could use anydate from anytime which can automatically pick up the format and change to Date class

library(data.table)
library(anytime)
df[, (dtVar) := lapply(.SD, anydate), .SDcols = dtVar]

str(df)
#Classes ‘data.table’ and 'data.frame': 6 obs. of  2 variables:
# $ X1: Date, format: "1996-01-04" "1996-01-05" "1996-01-08" "1996-01-09" ...
# $ X2: Date, format: "1996-02-01" "1996-03-01" "1996-04-01" "1996-05-01" ...
0
votes

Here is one way to do it:


library(data.table)

# sample data
df <- data.table(
  X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", "1996-01-11"), 
  X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", "09/01/1996"), 
  stringsAsFactors = FALSE)

str(df)

dtFmt <- list(X1 = "%Y-%m-%d", X2 = "%d/%m/%Y")

for (col in names(df)) {
  df[[col]] <- as.Date(df[[col]],  dtFmt[[col]]) 
}

str(df)