0
votes

I have two datasets that I have appended together in Stata.

There is one variable, say Age in both data sets. I sorted the data so that the ages are in ascending order. I want to delete the observations in each dataset where the corresponding ages don't match.

Dataset 1:

Obs Age
1    7
2    8
3   10
4    5

Dataset 2:

Obs Age
1   10
2    5
3    9
4    7

Combined and sorted in ascending order:

Obs Age
1    5
2    5
3    7
4    7
5    8
6    9
7   10
8   10

So because the ages when sorted don't match up for observations 5 and 6, I want to delete them. Essentially I want a way to loop through pairs of adjacent numbers and compare their values so that I'm only left with pairs with the same ages.

1

1 Answers

1
votes

Looping over observations is inefficient and in the vast majority of cases not necessary.

The following works for me:

clear

input age
5
5
7
7
8
9
10
10
end

generate tag = age != age[_n+1] & age != age[_n-1]
list

     +-----------+
     | age   tag |
     |-----------|
  1. |   5     0 |
  2. |   5     0 |
  3. |   7     0 |
  4. |   7     0 |
  5. |   8     1 |
     |-----------|
  6. |   9     1 |
  7. |  10     0 |
  8. |  10     0 |
     +-----------+

After getting rid of the relevant observations you get the desired result:

keep if tag == 0
list

     +-----------+
     | age   tag |
     |-----------|
  1. |   5     0 |
  2. |   5     0 |
  3. |   7     0 |
  4. |   7     0 |
  5. |  10     0 |
     |-----------|
  6. |  10     0 |
     +-----------+