I have the following data.table.
dat <- structure(list(kmers = c("TTTTTTTTTTTT", "TCCATTCCATTC", "TTCCATTCCATT",
"CCATTCCATTCC", "ATTCCATTCCAT", "CATTCCATTCCA", "TTTTATTATTTT",
"AAAATTATAAAA", "AAGACAATTTCT", "AAAGACAATTTC"), counts = c(16361L,
10090L, 9599L, 9021L, 8516L, 8325L, 5739L, 5642L, 5378L, 5326L
)), .Names = c("kmers", "counts"), class = c("data.table", "data.frame"
), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x29f1d78>)
This is the table
kmers counts
1: TTTTTTTTTTTT 16361
2: TCCATTCCATTC 10090
3: TTCCATTCCATT 9599
4: CCATTCCATTCC 9021
5: ATTCCATTCCAT 8516
6: CATTCCATTCCA 8325
7: TTTTATTATTTT 5739
8: AAAATTATAAAA 5642
9: AAGACAATTTCT 5378
10: AAAGACAATTTC 5326
I would like to divide the column counts by the sum of all counts. For a dataframe i would do
total=sum(dat$counts)
freq <- dat$counts/total
How can i do that for the data.table ? Each kmers is unique so i do not expect to have duplicated values in kmers column.
For example for first row it will be 16361/sum(dat$counts)
.
dat[, freq := counts / sum(counts)]
– David Arenburg