1
votes

I have a big tibble having 7000 rows and 10000 columns. I want to filter row if it has at least one non zero element across all columns. I wrote the following code which works fine on small tibble. As soon as I increases number of columns it breaks

Can anyone tell where is the glitch ?

Thanks.

library(tidyverse)
ncol = 10000
nrow = 7000

rr = sample(c(0,1) , nrow * ncol , replace = TRUE) %>% 
  matrix(ncol = ncol) %>% 
  as.data.frame() %>% 
  as_tibble()

rr %>% dplyr::filter_if(is.numeric , .vars_predicate = any_vars(. != 0 ))
#> Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

Created on 2019-07-19 by the reprex package (v0.3.0)

1
Not sure what the issue is with filter_if but the base R alternative works fine. rr[rowSums(rr != 0) > 0, ] - Ronak Shah
@RonakShah that's also 50 times faster than the dplyr version. - Spacedman

1 Answers

3
votes

The code fails when you have 4953 columns, and any (even a small) number of rows. If you read the error message, it mentions options(expressions=). Setting this to a larger number fixes it. By default it is 5000. I don't know where the other 47 nested expressions have come from but:

With default setting:

> options(expressions=5000)

and a data set of 4953 columns, 10 rows:

> dim(rrf)
[1]   10 4953

it fails....

> rrf %>% dplyr::filter_if(is.numeric , .vars_predicate = any_vars(. != 0 ))
Error in filter_impl(.data, quo) : 
  Evaluation error: evaluation nested too deeply: infinite recursion / options(expressions=)?.

So, as clearly stated in the error message, try setting the expressions option:

> options(expressions=6000)

And shazam it works:

> rrf %>% dplyr::filter_if(is.numeric , .vars_predicate = any_vars(. != 0 )) %>% nrow
[1] 10

(I've just piped this into nrow to get something that works and doesn't print all the columns).