0
votes

I'm fairly new in R. I have a database (panel) and I want to delete some observations based on certain values. Let's take the next panel as an example (derived from plm packages):

Panel <-read.dta("http://dss.princeton.edu/training/Panel101.dta")
> head(Panel)
  country year           y y_bin        x1         x2          x3   opinion op
1       A 1990  1342787840     1 0.2779036 -1.1079559  0.28255358 Str agree  1
2       A 1991 -1899660544     0 0.3206847 -0.9487200  0.49253848     Disag  0
3       A 1992   -11234363     0 0.3634657 -0.7894840  0.70252335     Disag  0
4       A 1993  2645775360     1 0.2461440 -0.8855330 -0.09439092     Disag  0
5       A 1994  3008334848     1 0.4246230 -0.7297683  0.94613063     Disag  0
6       A 1995  3229574144     1 0.4772141 -0.7232460  1.02968037 Str agree  1

I want to delete the observations for the next year when OP=1. For instance if in 1990, OP =1, I want to exclude country in 1991, 1992, 1992, etc (all the next years of the database). If OP =1 in 1996, I want to exclude country in 1997, 1998 and 1999.

PS : The dataframe may be not be a good example but in my dataframe, OP = 1 only once.

Does anyone know how I can do that?

Thanks in advance.

EDIT : I forgot to say that I want also to keep observations that have OP=0 for each year. I'm running a logit model. Therefore I'm comparing OP=1 and OP=0.

3

3 Answers

1
votes

I am assuming you want to remove all the rows after 1 in OP for each country separately.

Using dplyr with filter :

library(dplyr)

Panel <- foreign::read.dta("http://dss.princeton.edu/training/Panel101.dta")

Panel %>%
  group_by(country) %>%
  filter(row_number() <= match(1, op)) %>%
  ungroup

#   country  year           y y_bin      x1     x2     x3 opinion      op
#   <fct>   <int>       <dbl> <dbl>   <dbl>  <dbl>  <dbl> <fct>     <dbl>
# 1 A        1990  1342787840     1  0.278  -1.11  0.283  Str agree     1
# 2 B        1990 -5934699520     0 -0.0818  1.43  0.0234 Agree         1
# 3 C        1990 -1292379264     0  1.31   -1.29  0.204  Agree         1
# 4 D        1990  1883025152     1 -0.314   1.74  0.647  Disag         0
# 5 D        1991  6037768704     1  0.360   2.13  1.10   Disag         0
# 6 D        1992    10244189     1  0.0519  1.68  0.970  Str agree     1
# 7 E        1990  1342787840     1  0.453   1.73  0.597  Str disag     0
# 8 E        1991  2296009472     1  0.419   1.71  0.793  Str agree     1
# 9 F        1990  1342787840     1 -0.568  -0.347 1.26   Str agree     1
#10 G        1990  1342787840     1  0.945  -1.52  1.45   Str disag     0
#11 G        1991 -1518985728     0  1.10   -1.46  1.44   Agree         1

Or same thing with slice :

Panel %>%
  group_by(country) %>%
  slice(seq_len(match(1, op))) %>%
  ungroup
1
votes

We can use slice

library(dplyr)
Panel %>%
     group_by(country) %>%
     slice(seq_len(match(1, op))) %>%
     ungroup

data

Panel <- foreign::read.dta("http://dss.princeton.edu/training/Panel101.dta")
0
votes

Your answers were great. But actually, I forgot to precise something in the question. Your answers allow me to keep observations which had op=1. But I want also to keep those who have OP=0 for each year. I'm running a logit model. By the way those who have OP=0 will be the non adopters for instance and the OP=1 will be adopters.