First and Last Occurrences in Stata

Question

I have a panel dataset from 2006 to 2012. I generated a new variable entry that takes a value of 1 for the firm that entered to a country. For instance if a firm has missing value (.) for its sales at time (t) it takes a value of 0 and at (t+1) if it enters to a country in other words has a value for its sales it takes a value of 1. The successful command that I used for this is as follows:

egen firm_id=group(firm country)

by firm_id (year), sort: gen byte entry = ///
sum(inrange(sales, 0,.)) == 1  & sum(inrange(sales[_n - 1],0,.)) == 0

Since my data start from 2006 I excluded the observations for this year with the command:

bysort firm (year) : replace entry = 0 if year == 2006

However what I want is instead of having 0 values, to have missing values for the subsequent years after its entry (e.g. at t+2 or t+3).

The same I applied for the exit but I changed the sort order of year:

gen nyear = -year

by firm_id (nyear), sort: gen byte exit = ///
sum(inrange(sales, 0,.)) == 1  & sum(inrange(sales[_n - 1],0,.)) == 0

since the last observation year in my data is 2012 I excluded those observations:

bysort firm (year) : replace exit = 0 if year == 2012

Again here what I want is instead of having 0 values, to have missing values for the subsequent years after its exit (e.g. at t+2 or t+3).

Nick Cox Nick Cox · Accepted Answer · 2017-05-18T12:21:54

As I understand it the variable sales is missing when are none and positive otherwise.

You want indicators for a year being the first and last years of sales for a firm in a country.

I think this gets you most of the way. First, we need example data!

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(firm_id year sales)
1 2006   .
1 2007   .
1 2008  42
1 2009  42
1 2010  42
1 2011   .
1 2012   .
2 2006   .
2 2007 666
2 2008 666
2 2009   .
2 2010   .
2 2011   .
2 2012   .
end

The first and last dates are the minimum and maximum dates, conditional on there being sales.

egen first = min(cond(sales < ., year, .)), by(firm_id)
egen last = max(cond(sales < ., year, .)), by(firm_id)

For discussion of technique, see section 9 of this paper. Then (1, .) indicators follow directly

generate isfirst = cond(year == first, 1, .)
generate islast = cond(year == last, 1, .)

list, sepby(firm_id)

     +----------------------------------------------------------+
     | firm_id   year   sales   first   last   isfirst   islast |
     |----------------------------------------------------------|
  1. |       1   2006       .    2008   2010         .        . |
  2. |       1   2007       .    2008   2010         .        . |
  3. |       1   2008      42    2008   2010         1        . |
  4. |       1   2009      42    2008   2010         .        . |
  5. |       1   2010      42    2008   2010         .        1 |
  6. |       1   2011       .    2008   2010         .        . |
  7. |       1   2012       .    2008   2010         .        . |
     |----------------------------------------------------------|
  8. |       2   2006       .    2007   2008         .        . |
  9. |       2   2007     666    2007   2008         1        . |
 10. |       2   2008     666    2007   2008         .        1 |
 11. |       2   2009       .    2007   2008         .        . |
 12. |       2   2010       .    2007   2008         .        . |
 13. |       2   2011       .    2007   2008         .        . |
 14. |       2   2012       .    2007   2008         .        . |
     +----------------------------------------------------------+

I have done anything different for 2006 or 2012. You could just build special rules into the cond() syntax.

First and Last Occurrences in Stata

1 Answers