0
votes

I am trimming the dataset in Stata, and I want to drop all the firms with 4 or more consecutive missing observations in total assets variable. How could I do that?

The data look like this:

enter image description here

I would like to drop all observations of b, even if there is total assets value for b in year 2000.

2
Data examples shown as images are helpful but not nearly as helpful as data that can be copied and pasted. Stata advice that carries over to SO is given at statalist.org/forums/help#stataNick Cox

2 Answers

1
votes

Added in final edit:

The answer from Nick Cox is correct; this one should be disregarded. The first set of code is not relevant to the restated version of the problem that now appears in the initial post, and the second set suffers from the error pointed out by Nick Cox in his comment below.

===============================

Well, let's assume (since you haven't really described your data) that you have a variable ta that reports total assets, and is sometimes missing, and a variable firmID that inditifies each firm, and is never missing. Then

bysort firmID: egen num_miss = total(missing(ta))
drop if num_miss >=4

might do what you want. The function missing(ta) will be 1 if ta is missing and 0 otherwise, and num_miss will contain the count of how many observations of the current firmID have a missing ta.

Added in response to Nick Cox's comment above:

If we additionally assume that you have a variable year that defines the order of "consecutive" observations, and you want to drop all firms that have a run of 4 or more consecutive observations with missing values, then the following might do what you want. Or it might not - I didn't test it on the sample data you didn't provide.

bysort firmID (year): egen num_run = total(_n>=4 & missing(ta[_n-3]) & missing(ta[_n-2]) & missing(ta[_n-1]) & missing(ta[_n]))
drop if num_run>0
1
votes

For completeness, here is a solution if the criterion really is consecutive spells with 4 or more missings. We need to restart a counter if we meet non-missing values.

In this example id 1 has four missings, but not consecutively, whereas id 2 has four consecutive missings. We want to drop 2 but not 1.

clear 
input id time y
1  1    .
1  2    .
1  3    1 
1  4    .
1  5    .
2  1    2 
2  2    .  
2  3    .
2  4    .
2  5    . 
end 

bysort id (time) : gen seq = missing(y) & (!missing(y[_n-1]) | _n == 1) 
by id : replace seq = seq[_n-1] + 1 if missing(y) & seq[_n-1] >= 1 & _n > 1 

list, sepby(id) 

     +---------------------+
     | id   time   y   seq |
     |---------------------|
  1. |  1      1   .     1 |
  2. |  1      2   .     2 |
  3. |  1      3   1     0 |
  4. |  1      4   .     1 |
  5. |  1      5   .     2 |
     |---------------------|
  6. |  2      1   2     0 |
  7. |  2      2   .     1 |
  8. |  2      3   .     2 |
  9. |  2      4   .     3 |
 10. |  2      5   .     4 |
     +---------------------+


bysort id (seq) : drop if seq[_N] >= 4 
(5 observations deleted)

See also tsspell (SSC) and this paper on identifying spells in Stata.