Added in final edit:
The answer from Nick Cox is correct; this one should be disregarded. The first set of code is not relevant to the restated version of the problem that now appears in the initial post, and the second set suffers from the error pointed out by Nick Cox in his comment below.
===============================
Well, let's assume (since you haven't really described your data) that you have a variable ta
that reports total assets, and is sometimes missing, and a variable firmID
that inditifies each firm, and is never missing. Then
bysort firmID: egen num_miss = total(missing(ta))
drop if num_miss >=4
might do what you want. The function missing(ta)
will be 1 if ta
is missing and 0 otherwise, and num_miss
will contain the count of how many observations of the current firmID
have a missing ta
.
Added in response to Nick Cox's comment above:
If we additionally assume that you have a variable year
that defines the order of "consecutive" observations, and you want to drop all firms that have a run of 4 or more consecutive observations with missing values, then the following might do what you want. Or it might not - I didn't test it on the sample data you didn't provide.
bysort firmID (year): egen num_run = total(_n>=4 & missing(ta[_n-3]) & missing(ta[_n-2]) & missing(ta[_n-1]) & missing(ta[_n]))
drop if num_run>0