This is a basic question, but I'm stumped:
I have the following R data.table:
library(data.table)
DT <- fread('unique_point biased data_points team groupID
up1 FALSE 3 A xy28352
up1 TRUE 4 A xy28352
up2 FALSE 1 A xy28352
up2 TRUE 0 X xy28352
up3 FALSE 12 Y xy28352
up3 TRUE 35 Z xy28352')
which prints out as
> DT
unique_point biased data_points team groupID
1: up1 FALSE 3 A xy28352
2: up1 TRUE 4 A xy28352
3: up2 FALSE 1 A xy28352
4: up2 TRUE 0 X xy28352
5: up3 FALSE 12 Y xy28352
6: up3 TRUE 35 Z xy28352
The values for the column team
are letters A to Z, 26 possibilities. At the moment. If I count the row values with this code:
DT[, counts := .N, by=c("team")]
which gives
> DT
unique_point biased data_points team groupID counts
1: up1 FALSE 3 A xy28352 3
2: up1 TRUE 4 A xy28352 3
3: up2 FALSE 1 A xy28352 3
4: up2 TRUE 0 X xy28352 1
5: up3 FALSE 12 Y xy28352 1
6: up3 TRUE 35 Z xy28352 1
I would like to create 26 new columns in DT
which gives the size of each team
, A
, B
, C
, etc.
The resulting data.table would look like:
> DT
unique_point biased data_points team groupID A B C ... Z
1: up1 FALSE 3 A xy28352 3 0 0 ... 1
2: up1 TRUE 4 A xy28352 3 0 0 ... 1
3: up2 FALSE 1 A xy28352 3 0 0 ... 1
4: up2 TRUE 0 X xy28352 3 0 0 ... 1
5: up3 FALSE 12 Y xy28352 3 0 0 ... 1
6: up3 TRUE 35 Z xy28352 3 0 0 ... 1
I'm not sure how one does this with data.table
syntax..
EDIT: I'm happy to do this with base R and dplyr as well.
counts
column you've created? – David Arenburgmclapply
through many data.tables to aggregate one data.table that will have each of these counts. At the moment, when I usecounts
from individual data.tables, there may not be a value for certain letters (e.g. in the above example, C, D, E, ...). So, I'm tried to create these sums for each individual data.table, and pass these values to the summary table after mclapply. Make sense? – ShanZhengYang