How to select (four) specific rows (multiple times) based on a column value in R?

Question

I only want to select the ID's that are in my dataframe for all years, from 2013 untill 2016 (so four times). In that case ID's with only four rows are left (panel data, each ID has 1 row for each year). I already made sure my dataframe only covers the years I need (2013, 2014, 2015, and 2016), but I want to exclude the ID's that have less than 4 years/rows in my dataframe.

This is the structure of my dataframe:

 tibble [909,587 x 26] (S3: tbl_df/tbl/data.frame)
     $ ID                         : num [1:909587] 12 12 12 12 16 16 16 16...
     $ Gender                     : num [1:909587] 2 2 2 2 1 1 1 1 1 1 ...
      ..- attr(*, "format.spss")= chr "F10.0"
     $ Year                       : chr [1:909587] "2016" "2013" "2014" "2015" ...
      ..- attr(*, "format.spss")= chr "F9.3"
     $ Size                       : num [1:909587] 1983 1999 1951 1976 902 ...
     $ Costs                      : num [1:909587] 2957.47 0 0.34 1041.67 0 ...
     $ Urbanisation               : num [1:909587] 2 3 3 2 3 3 2 2 2 3 ...
     $ Age                        : num [1:909587] 92 89 90 91 82 83 22 23 24 65 ...

How can I achieve that?

Thank you!

Please read about how to provide a good example. Providing just the structure of your data is not that helpful. Consider using dput. That said, this might work: df %>% group_by(ID) %>% filter(n_distinct(Year) >= 4) — JasonAizkalns
Thank you Jason, I hoped the structure was sufficient. Good news tho; you code worked! Now I have only ID's with 4 years. Thank you! — Student0172
Hi Jason, thank you for doing that! I just found out the code 'missed' a couple of years/rows/ID's. I still have 174 more rows than I should have based on lenght(unique(df$ID) * 4 rows.. Any idea how? — Student0172

Bruno Bruno · Accepted Answer · 2020-05-15T12:27:18

Pivot your df

df %>% pivot_wider(names_from = Year,values_from = Age)

Filter the na's rows out of columns 2013,2014,2015,2016

Pivot back

df %>% pivot_longer(2013:2016)

How to select (four) specific rows (multiple times) based on a column value in R?

2 Answers