count NA's appearing in between non-missing values

Question

I have a data frame with 5 time columns (1st year, second y, etc), and for some rows I have NA's in between non-missing values. Sample below:

df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2, 
3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA, 
3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA, 
2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA, NA, 3.2, 1, 4, NA, 
1, NA, NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1), FourthYStage = c(NA, 
NA, 3.1, NA, NA, NA, 1, 4, NA, 1, NA, NA, NA, 4, 2, NA, NA, NA, 
2, 1), FifthYStage = c(NA, NA, 2, NA, NA, NA, 1, 5, NA, NA, NA, 
NA, 3.2, NA, 2, 3.2, NA, NA, 2, 1)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -20L))

I would like to count, using dplyr, the number of rows that have missing values in between non-missing values. Rows 13, 14,and 16 are examples of this.

How can I achieve this? I have a feeling this has to do with pasteing the entire row and looking to the left and to the right of the NA... but not clear how to proceed with this.

Lennyy Lennyy · Accepted Answer · 2019-06-07T08:48:41

sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", apply(df, 1, paste, collapse = "")))

[1] 3

In short:

concatenate the rows
check which rows return TRUE for having a match to the string: "number-NA-number" (in this case, row 13, 14 and 16)
sum the rows which evaluate to TRUE

Based on @ Sotos comment, this is even better since it prevents us from using apply with margin = 1:

sum(grepl("[[:digit:]]+[NA]+[[:digit:]]", do.call(paste0, df)))

count NA's appearing in between non-missing values

1 Answers