0
votes

I am very confused. I have tried to drop NA's from my data.frame/data.table in multiple ways: na.omit, dropNA() (a function I found from StackOverflow), complete.cases,

dropNA():

dropNA <- function(dat) {
  dat %>% filter(rowSums(is.na(.)) != ncol(.))
}

I attempt the aforementioned methods to remove NAs but as you can see in the tibble below, NAs are still included in the result.

> # drop NAs:
> design_mat4 <- na.omit(design_mat4)
> design_mat4 <- dropNA(design_mat4)
> design_mat4 <- design_mat4[complete.cases(design_mat4), ]
> target_n <- sum(design_mat4$label == 0)
> a <- design_mat4[which(design_mat4$label == 1), ]
> positive_samp = a[sample(x       = nrow(design_mat4),
+                          size    = target_n, 
+                          replace = TRUE), ]
> positive_samp
# A tibble: 50,447 x 14
   email_status score email_is_blacklis~ email_domain_is_bla~ email_domain_blackl~ email_domain_pa~
   <fct>        <int> <fct>              <fct>                <fct>                <fct>           
 1 verified        85 0                  0                    ""                   not_parked      
 2 verified        85 1                  0                    ""                   not_parked      
 3 verified        85 0                  0                    ""                   not_parked      
 4 NA              NA NA                 NA                   NA                   NA              
 5 verified        57 1                  0                    ""                   not_parked      
 6 verified        85 0                  0                    ""                   no_website_cont~
 7 verified        57 1                  0                    ""                   not_parked      
 8 verified        85 0                  0                    ""                   not_parked      
 9 NA              NA NA                 NA                   NA                   NA              
10 verified        85 0                  0                    ""                   not_parked      
# ... with 50,437 more rows, and 8 more variables: email_domain_lawsite <fct>, . . ., label <fct>

Is it because the tibble produces summary statistics on the original state of the data?

In the end, I want the NAs removed. Please help!

1
Have you tried df %>% na.omitTony Hellmuth
@TonyHellmuth Yes.user2205916
Is it possible that your NA values are actually strings?Marcus Campbell
And you must have tried df %>% filter(complete.cases(.)) too.Tony Hellmuth
Yes this could be true for a couple of reasons but would probably need to reproduce the result; maybe give us a sample?Tony Hellmuth

1 Answers

0
votes

Maybe there are some conflicts with other packages already loaded in you R session. Try to add the name of the package before the function you use, like below:

library(dplyr)

df <- data_frame(a = c(1, NA, 5, 99), b = c(20, -1, NA, NA))
df %>%
 stats::na.omit()

A tibble: 1 x 2

      a     b
    <dbl> <dbl>
1     1    20