3
votes

I have a data frame with 5 time columns (1st year, second y, etc), and for some rows I have NA's in between non-missing values. Sample below:

df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2, 
3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA, 
3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA, 
2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA, NA, 3.2, 1, 4, NA, 
1, NA, NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1), FourthYStage = c(NA, 
NA, 3.1, NA, NA, NA, 1, 4, NA, 1, NA, NA, NA, 4, 2, NA, NA, NA, 
2, 1), FifthYStage = c(NA, NA, 2, NA, NA, NA, 1, 5, NA, NA, NA, 
NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -20L))

I would like to count, using dplyr, the number of rows that have missing values in between non-missing values. Rows 13, 14,and 16 are examples of this.

How can I achieve this? I have a feeling this has to do with pasteing the entire row and looking to the left and to the right of the NA... but not clear how to proceed with this.

1
So you want your output to be 13, 14 and 16? - Ronak Shah
@Ronak Shah Or just 3 (the count) - Omry Atia

1 Answers

5
votes
sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", apply(df, 1, paste, collapse = "")))

[1] 3

In short:

  1. concatenate the rows

  2. check which rows return TRUE for having a match to the string: "number-NA-number" (in this case, row 13, 14 and 16)

  3. sum the rows which evaluate to TRUE

Based on @ Sotos comment, this is even better since it prevents us from using apply with margin = 1:

sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", do.call(paste0, df)))