0
votes

I have a large dataset of around 35000 observations and 24 variables (one of which is a time-series), but I can summarise what I want to achieve using iris.

library(tidyverse)

iris.new <- iris %>%
  arrange(Species, Sepal.Length, Sepal.Width) %>%
  group_by(Species)

unwanted <- iris.new %>%
  filter(Sepal.Length > 5 & Sepal.Width==min(Sepal.Width))

while(nrow(unwanted)!=0) {
  iris.new <- iris.new %>%
    arrange(Species, Sepal.Length, Sepal.Width) %>%
    group_by(Species) %>%
    filter(!(Sepal.Length > 5 & Sepal.Width == min(Sepal.Width)))
  unwanted <- iris.new %>%
    filter(Sepal.Length > 5 & Sepal.Width==min(Sepal.Width))
}

I want to filter only Sepal.Length > 5, which has minimum Sepal.Width within observations for each Species (setosa and versicolor has none). When I got rid of the first one, I repeat the filter to see if there are any and finally used a 'while' loop to do that for me.

Is there a way to filter them without using a loop?

1
Your question is not clear. Why the while loop? Just why not do filter(sepal.length<=5)???Onyambu
@Onyambu, because when filtered once, the min(Sepal.Width) changes. Then I have to repeat again according to the new min(Sepal.Width). When manually repeating our of the while loop, it is easier to see. At first 1 row is removed, then 3 more.M. Saka
well, you will filter all the mins until 5 is done, then you will stop filtering. So why just not filter the 5??Onyambu
The original data that I am working on has a date column, a value column which begins as 0, and a name column with 1200 different names (similar to Species). My goal to repeat is because I try to trim the first zero values at the beginning (in terms of date). I hope that makes more sense.M. Saka

1 Answers

0
votes

I think this does the trick:

# get minimum Sepal.Width without Sepal.Length > 5
iris_min <- iris %>%
  group_by(Species) %>%
  filter(Sepal.Length <= 5) %>%
  summarize(min_sep_width = min(Sepal.Width))

# check to see that nothing is below our minimum 
#   or equal to it with a Sepal.Length that's too long
iris_new <- iris %>% 
  left_join(iris_min, by = c('Species')) %>%
  filter(min_sep_width < Sepal.Width | 
           (min_sep_width == Sepal.Width & Sepal.Length <= 5)) %>%
  select(-min_sep_width)