2
votes

Say I have a data table.

# Load package
library(data.table)

# Dummy data table
dt <- data.table(foo = letters[1:10], 
                 bar = LETTERS[11:20], 
                 val = 100:109)

I'd like to run a function on a column of this data table that returns a data table with multiple columns. Let's say it looks like this:

# Function that returns data table
f <- function(x){
  data.table(stuff = paste0(x, "_stuff"),
             things = paste0("things_", x))
}

If I run it on foo from the data table, it returns this:

# Returns only columns from function 
dt[, f(foo)]
#>       stuff   things
#>  1: a_stuff things_a
#>  2: b_stuff things_b
#>  3: c_stuff things_c
#>  4: d_stuff things_d
#>  5: e_stuff things_e
#>  6: f_stuff things_f
#>  7: g_stuff things_g
#>  8: h_stuff things_h
#>  9: i_stuff things_i
#> 10: j_stuff things_j

Great! Now, I'd like the returned data table to be appended to my original data table. If I wanted to append it to just a few columns from the original data table, I could just name the columns I want to retain like so:

# Returns named column and columns from function
dt[, .(foo, f(foo))]
#>     foo   stuff   things
#>  1:   a a_stuff things_a
#>  2:   b b_stuff things_b
#>  3:   c c_stuff things_c
#>  4:   d d_stuff things_d
#>  5:   e e_stuff things_e
#>  6:   f f_stuff things_f
#>  7:   g g_stuff things_g
#>  8:   h h_stuff things_h
#>  9:   i i_stuff things_i
#> 10:   j j_stuff things_j

But, I want to keep all columns without having to name them individually. One way would be to use by:

# Retains all columns
dt[, f(foo), by = names(dt)]
#>     foo bar val   stuff   things
#>  1:   a   K 100 a_stuff things_a
#>  2:   b   L 101 b_stuff things_b
#>  3:   c   M 102 c_stuff things_c
#>  4:   d   N 103 d_stuff things_d
#>  5:   e   O 104 e_stuff things_e
#>  6:   f   P 105 f_stuff things_f
#>  7:   g   Q 106 g_stuff things_g
#>  8:   h   R 107 h_stuff things_h
#>  9:   i   S 108 i_stuff things_i
#> 10:   j   T 109 j_stuff things_j

Created on 2020-02-18 by the reprex package (v0.3.0)

This gives me the desired result for this test case, but clearly this only makes sense if rows are unique.

I've tried to use something like dt[, .(names(dt), f(foo))], but this doesn't work as names returns a vector of strings which is then added as a column.

The obvious solution is to use := like so: dt[, c("One", "Two") := f(foo)]. This gives the desired result but I have to name the added columns myself, whereas I'd like to retain the column names returned by the function.

Another solution might be cbind(dt, dt[, foo(f)]), but this seems cludgy.

What is the correct way to achieve this result?

2
Why don't you wrap cbind(dt, dt[, f(foo)]) in a function?Rui Barradas
Sure, I can do that, but I was wondering if there's a canonical data table way of doing this, as it must be a fairly common task.Lyngbakr
there are things like dt[, names(f(dt$foo)) := f(foo)][]that would work but it isn't really what your looking forGainz
@Gainz Yeah, that would require that f is called twice, which isn't optimal.Lyngbakr
If you find a better solution please @myname. data.table is one of the package I use the most, I would be please to learn something new!Gainz

2 Answers

2
votes

Some suggestions:

1) Modify data.table in the function:

f2 <- function(dt){
    cols <- c("stuff", "things")
    dt[, (cols) := lapply(paste0("_", cols), function(x) paste0(foo, x))]
}
f2(dt)

2) using NSE:

eval(substitute(dt[, (LHS) := RHS], list(RHS={a <- dt[, f(foo)]}, LHS=names(a))))

3) Or assign result to a variable and then update by reference similar to what Gainz mentioned in the comments

a <- dt[, f(foo)]
dt[, names(a) := a]

I prefer option (3).

0
votes

Another option, if you don't need to do anything with i or by= in your processing would be to make dt an argument to your function and then use set() inside your function, as shown below.


f <- function(x, dt){
  set(dt, j = "stuff", value = paste0(dt[[x]], "_stuff"))
  set(dt, j = "things", value = paste0("things_", dt[[x]]))
}

f("foo", dt)

Output:

> dt
    foo bar val   stuff   things
 1:   a   K 100 a_stuff things_a
 2:   b   L 101 b_stuff things_b
 3:   c   M 102 c_stuff things_c
 4:   d   N 103 d_stuff things_d
 5:   e   O 104 e_stuff things_e
 6:   f   P 105 f_stuff things_f
 7:   g   Q 106 g_stuff things_g
 8:   h   R 107 h_stuff things_h
 9:   i   S 108 i_stuff things_i
10:   j   T 109 j_stuff things_j