get number of rows where certain column is not NA for different categories of another column

Question

I have a data.table qq. Following is the dput

    structure(list(Sepal.Length = c(NA, NA, 4.7, NA, 5), Sepal.Width = c(3.5, 
NA, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4), 
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = c("a", 
    "a", "b", "b", "b")), .Names = c("Sepal.Length", "Sepal.Width", 
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA, 
-5L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000011c90788>)

> qq
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:           NA         3.5          1.4         0.2       a
2:           NA          NA          1.4         0.2       a
3:          4.7         3.2          1.3         0.2       b
4:           NA         3.1          1.5         0.2       b
5:          5.0         3.6          1.4         0.2       b

I want to get number of rows for each category of column Species such that column Sepal.Length is not NA. Final result will look like following (see new column no_rows)

> qq

       Sepal.Length Sepal.Width Petal.Length Petal.Width Species    no_rows
    1:           NA         3.5          1.4         0.2       a      0
    2:           NA          NA          1.4         0.2       a      0
    3:          4.7         3.2          1.3         0.2       b      2 
    4:           NA         3.1          1.5         0.2       b      2
    5:          5.0         3.6          1.4         0.2       b      2

I did the following

qq[, no_rows:= nrow(qq[is.na(Sepal.Length) == F, ]), .(Species)]

But it is not working. Can somebody give me an efficient solution for a data.table?

akrun akrun · Accepted Answer · 2015-11-20T11:52:31

We group by 'Species', get the sum of logical index (!is.na(Sepal.Length)) and assign (:=) the output to create the 'no_rows'.

qq[, no_rows := sum(!is.na(Sepal.Length)) , by = Species]
qq
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species no_rows
#1:           NA         3.5          1.4         0.2       a       0
#2:           NA          NA          1.4         0.2       a       0
#3:          4.7         3.2          1.3         0.2       b       2
#4:           NA         3.1          1.5         0.2       b       2
#5:          5.0         3.6          1.4         0.2       b       2

get number of rows where certain column is not NA for different categories of another column

1 Answers