I would like to delete incomplete cases from each dataframes of a nested tibble. I did try to use the map function (purrr package), but I received the following error message "Error in parent.env(x) : argument is not an environment". I do not understand what is the problem.
Here is a reproductible example.
library(tidyverse)
gapminder_orig <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv")
gapminder_orig <- gapminder_orig %>%
dplyr::select(continent, country, year, pop, lifeExp, gdpPercap)
data_with_NA<-map_df(gapminder_orig[,4:6], function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})
gapminder_orig_with_NA<-gapminder_orig %>%
mutate(pop=data_with_NA$pop, lifeExp=data_with_NA$lifeExp, gdpPercap=data_with_NA$gdpPercap)
gapminder_nested <- gapminder_orig_with_NA %>%
mutate(dummy_var= sample(1:3, nrow(gapminder_orig_with_NA), replace=TRUE)) %>%
group_by(continent) %>%
nest() %>%
add_column(Type=c("Full", "Full", "Subset","Subset","Subset")) %>%
add_column(Sector=c("Agriculture", "Banking", "Agriculture", "Banking", "Agriculture"))
gapminder_nested
remove_NA<-function(x) {
y <- x[complete.cases(x),]
return(y)
}
remove_NAz<-function(x, z) {
y <- x[complete.cases(x),]
return(y)
}
test<-gapminder_nested %>%
#mutate(data2 = map(.x=data, .f=filter(complete.cases(.x)))) #Does not work
#mutate(data2 = map(.x=data, .f=na.omit)) #Does not work
#mutate(data2 = map(data, ~ map_dfc(., na.omit))) #Does not work
#mutate(data2 = map(data, function(.x) remove_NA(.x))) #Does not work
mutate(data2= map2(data, Type, function(.x, .z) remove_NAz(.x, .z))) #Work but not elegant
Any idea of what is going wrong with the calls to map function? Why does it work with map2?
Thanks!