When in interactive mode, exploring aggregations in data.table, I may do dozens, or hundreds of experiments. The default column names after aggregations (V1
, V2
, etc) obviously aren't very informative, after writing the minimal amount of code needed to generate the aggregations. Often, I'd be happier with the default column name of a simple aggregation of one column, like a mean or sum, to be just the name of the underlying variable.
All the extra column name typing gets tiring, and I want to avoid this.
Is there any easy way to do this in data.table
?
e.g. a simplified example to demonstrate if it's not clear:
DT = data.table(x =rep(c("b","a","c"),each=3), y_a_long_name=c(1,3,6), v_a_long_name=1:9)
DT[, .(sum(v_a_long_name), mean(y_a_long_name)), by = x]
# x V1 V2
# 1: b 6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333
When you start working with several columns, using different aggregation type functions, the above labeling, V1
, V2
, isn't helpful.
All the extra typing of repeating the names is tedious, but I'd like to see something like this:
DT[, .(v_a_long_name = sum(v_a_long_name), y_a_long_name = mean(y_a_long_name)), by = x]
# x v_a_long_name y_a_long_name
# 1: b 6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333
while typing as minimal as possible. e.g. it would be ideal if
DT[, .(sum(v_a_long_name), mean(y_a_long_name)), by = x]
printed this by default:
# x v_a_long_name y_a_long_name
# 1: b 6 3.333333
# 2: a 15 3.333333
# 3: c 24 3.333333
GForce
from optimizing your queries. Does your IDE not have auto-complete? If you're just playing around, I don't see much penalty to using names likeV1
,V5
. If this is not ephemeral, writing explicit code will help others (including future-you) better understand your code. Choosing to eliminatesum
andmean
from the name of the aggregated variable strikes me as troublesome / begging for confusion/errors down the line. – MichaelChiricov_a_long_name_sum
andy_a_long_name_mean
) – MichaelChirico