I am very confused. I have tried to drop NA's from my data.frame
/data.table
in multiple ways: na.omit
, dropNA()
(a function I found from StackOverflow), complete.cases
,
dropNA()
:
dropNA <- function(dat) {
dat %>% filter(rowSums(is.na(.)) != ncol(.))
}
I attempt the aforementioned methods to remove NAs
but as you can see in the tibble
below, NAs are still included in the result.
> # drop NAs:
> design_mat4 <- na.omit(design_mat4)
> design_mat4 <- dropNA(design_mat4)
> design_mat4 <- design_mat4[complete.cases(design_mat4), ]
> target_n <- sum(design_mat4$label == 0)
> a <- design_mat4[which(design_mat4$label == 1), ]
> positive_samp = a[sample(x = nrow(design_mat4),
+ size = target_n,
+ replace = TRUE), ]
> positive_samp
# A tibble: 50,447 x 14
email_status score email_is_blacklis~ email_domain_is_bla~ email_domain_blackl~ email_domain_pa~
<fct> <int> <fct> <fct> <fct> <fct>
1 verified 85 0 0 "" not_parked
2 verified 85 1 0 "" not_parked
3 verified 85 0 0 "" not_parked
4 NA NA NA NA NA NA
5 verified 57 1 0 "" not_parked
6 verified 85 0 0 "" no_website_cont~
7 verified 57 1 0 "" not_parked
8 verified 85 0 0 "" not_parked
9 NA NA NA NA NA NA
10 verified 85 0 0 "" not_parked
# ... with 50,437 more rows, and 8 more variables: email_domain_lawsite <fct>, . . ., label <fct>
Is it because the tibble
produces summary statistics on the original state of the data?
In the end, I want the NAs removed. Please help!