I have a data.table qq
. Following is the dput
structure(list(Sepal.Length = c(NA, NA, 4.7, NA, 5), Sepal.Width = c(3.5,
NA, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4),
Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = c("a",
"a", "b", "b", "b")), .Names = c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA,
-5L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000011c90788>)
> qq
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: NA 3.5 1.4 0.2 a
2: NA NA 1.4 0.2 a
3: 4.7 3.2 1.3 0.2 b
4: NA 3.1 1.5 0.2 b
5: 5.0 3.6 1.4 0.2 b
I want to get number of rows for each category of column Species
such that column Sepal.Length
is not NA. Final result will look like following (see new column no_rows
)
> qq
Sepal.Length Sepal.Width Petal.Length Petal.Width Species no_rows
1: NA 3.5 1.4 0.2 a 0
2: NA NA 1.4 0.2 a 0
3: 4.7 3.2 1.3 0.2 b 2
4: NA 3.1 1.5 0.2 b 2
5: 5.0 3.6 1.4 0.2 b 2
I did the following
qq[, no_rows:= nrow(qq[is.na(Sepal.Length) == F, ]), .(Species)]
But it is not working. Can somebody give me an efficient solution for a data.table?