Say I have a data table.
# Load package
library(data.table)
# Dummy data table
dt <- data.table(foo = letters[1:10],
bar = LETTERS[11:20],
val = 100:109)
I'd like to run a function on a column of this data table that returns a data table with multiple columns. Let's say it looks like this:
# Function that returns data table
f <- function(x){
data.table(stuff = paste0(x, "_stuff"),
things = paste0("things_", x))
}
If I run it on foo
from the data table, it returns this:
# Returns only columns from function
dt[, f(foo)]
#> stuff things
#> 1: a_stuff things_a
#> 2: b_stuff things_b
#> 3: c_stuff things_c
#> 4: d_stuff things_d
#> 5: e_stuff things_e
#> 6: f_stuff things_f
#> 7: g_stuff things_g
#> 8: h_stuff things_h
#> 9: i_stuff things_i
#> 10: j_stuff things_j
Great! Now, I'd like the returned data table to be appended to my original data table. If I wanted to append it to just a few columns from the original data table, I could just name the columns I want to retain like so:
# Returns named column and columns from function
dt[, .(foo, f(foo))]
#> foo stuff things
#> 1: a a_stuff things_a
#> 2: b b_stuff things_b
#> 3: c c_stuff things_c
#> 4: d d_stuff things_d
#> 5: e e_stuff things_e
#> 6: f f_stuff things_f
#> 7: g g_stuff things_g
#> 8: h h_stuff things_h
#> 9: i i_stuff things_i
#> 10: j j_stuff things_j
But, I want to keep all columns without having to name them individually. One way would be to use by
:
# Retains all columns
dt[, f(foo), by = names(dt)]
#> foo bar val stuff things
#> 1: a K 100 a_stuff things_a
#> 2: b L 101 b_stuff things_b
#> 3: c M 102 c_stuff things_c
#> 4: d N 103 d_stuff things_d
#> 5: e O 104 e_stuff things_e
#> 6: f P 105 f_stuff things_f
#> 7: g Q 106 g_stuff things_g
#> 8: h R 107 h_stuff things_h
#> 9: i S 108 i_stuff things_i
#> 10: j T 109 j_stuff things_j
Created on 2020-02-18 by the reprex package (v0.3.0)
This gives me the desired result for this test case, but clearly this only makes sense if rows are unique.
I've tried to use something like dt[, .(names(dt), f(foo))]
, but this doesn't work as names
returns a vector of strings which is then added as a column.
The obvious solution is to use :=
like so: dt[, c("One", "Two") := f(foo)]
. This gives the desired result but I have to name the added columns myself, whereas I'd like to retain the column names returned by the function.
Another solution might be cbind(dt, dt[, foo(f)])
, but this seems cludgy.
What is the correct way to achieve this result?
cbind(dt, dt[, f(foo)])
in a function? – Rui Barradasdt[, names(f(dt$foo)) := f(foo)][]
that would work but it isn't really what your looking for – Gainzf
is called twice, which isn't optimal. – Lyngbakrdata.table
is one of the package I use the most, I would be please to learn something new! – Gainz