3
votes

I have a dataframe in which one column contains numeric vectors. I want to filter rows based on a condition involving that column. This is a simplified example.

df <- data.frame(id = LETTERS[1:3], name=c("Alice", "Bob", "Carol"))
mylist=list(c(1,2,3), c(4,5), c(1,3,4))  
df$numvecs <- mylist
df
#   id  name   numvecs
# 1  A  Alice  1, 2, 3
# 2  B  Bob    4, 5
# 3  C  Carol  1, 3, 4

I can use something like mapply e.g.

mapply(function(x,y) x=="B" & 4 %in% y, df$id, df$numvecs)

which correctly returns TRUE for the second row, and FALSE for rows 1 and 2.

However, I have reasons why I'd like to use dplyr filter instead of mapply, but I can't get dplyr filter to operate correctly on the numvecs column. Instead of returning two rows, the following returns no rows.

filter(df, 4 %in% numvecs)
# [1] id      numvecs
#    <0 rows> (or 0-length row.names)

What am I missing here? How can I filter on a conditional expression involving the numvecs column?

And ideally I'd like to use the non-standard evaluation filter_ as well, so I can pass the filter condition as an argument. Any help appreciated. Thanks.

2
You can check the map from library(purrr) - akrun
df <- data.frame(id = LETTERS[1:3], name=c("Alice", "Bob", "Carol")) mylist=list(c(1,2,3), c(4,5), c(1,3,4)) df$numvecs <- mylist df - JimBoy
FYI dplyr can work with data.frames as is, but if you're dealing with large data, it's worthwhile to convert them to a tbl_df. - smci
I'll check it out! - Garry

2 Answers

2
votes

We can still use mapply with filter

filter(df, mapply(function(x,y) x == "B" & 4 %in% y, id, numvecs))
#  id name numvecs
#1  B  Bob    4, 5

Or use map from purrr

library(purrr)
filter(df, unlist(map(numvecs, ~4 %in% .x)))
#  id  name numvecs
#1  B   Bob    4, 5
#2  C Carol 1, 3, 4

Or we can also do this in chain

df %>%
    .$numvecs %>% 
     map( ~ 4 %in% .x) %>%
     unlist %>% 
     df[.,]
#  id  name numvecs
#2  B   Bob    4, 5
#3  C Carol 1, 3, 4
1
votes

You can use sapply on the numvecs column and create a logic vector for subsetting:

library(dplyr)
filter(df, sapply(numvecs, function(vec) 4 %in% vec), id == "B")
#   id name numvecs
# 1  B  Bob    4, 5

filter(df, sapply(numvecs, function(vec) 4 %in% vec))
#   id  name numvecs
# 1  B   Bob    4, 5
# 2  C Carol 1, 3, 4