1
votes

I have two data sets. First one is:

countyGroup income  other_data_
    1       20990   …
    2       25622   …
    3       24289   …
    4       27391   …
    5       23326   …
    6       19261   …
    7       15197   …
    8       11132   …

The second one is:

countyGroup state   county  other_data
    1       IL      123     …
    1       IL      123     …
    2       MI      365     …
    1       IL      123     …
    3       AK      65      …
    4       IL      546     …
    5       MI      689     …
    6       AK      32      …

Variable countyGroup uniquely identifies both state and county. The second data set contains countyGroup, state and county. The first data set contains only countyGroup. I need to generate two variables (state and county) in the first data set.

I tried to match the values using merge:

use dataset1
merge 1:m countyGroup using dataset2, keepusing(state state cntyfp) update replace

The result that I get is the duplicate rows...I tried 1:1 it did not work...I can always drop those duplicate rows, but I wanted to merge them correctly from the beginning.

1

1 Answers

3
votes

We need to know about the duplicate rows (Stata's term is "observations"). If they are identical on all variables, there is no extra information, and it's cleanest and simplest to use duplicates drop to get them out of the way before the merge.

duplicates is a general tool for managing duplicates.

If there is different information on the other variables, tell us more about what that is and your goals here.

(LATER) With your datasets you posted

   merge 1:m countyGroup using dataset2

seems to work fine.