1
votes

I'm working on a match. I have created a basic database to learn and I have this data base:

propScore

       group dep public
    1      0   1      8
    2      0   2      7
    3      0   3      6
    4      0   4      7
    5      1   1      8
    6      1   2      7
    7      1   3      6
    8      1   4      7
    9      1   5      2
    10     1   6      3

And I use:

m.out = matchit(group ~ dep + public, data = propScore, method = "nearest", ratio = 1) 

but I obtain this match:

 1  
5  NA 
6  "1"
7  "4"
8  NA 
9  "3"
10 "2"

but I think the correct thing would be:

     1  
5  "1"
6  "2"
7  "3"
8  "4
9  NA
10 NA

What am I doing wrong? Thanks

1
Are you looking for replace=TRUE?jay.sf

1 Answers

3
votes

The way matchit works by default is that it estimates propensity scores for each unit using a logistic regression of the treatment on the covariates. This propensity score is stored in the distance attribute of m.out. We can take a look at the data with the propensity scores included:

> cbind(propScore, ps = m.out$distance)
   group dep public        ps
1      0   1      8 0.3903012
2      0   2      7 0.5294948
3      0   3      6 0.6642472
4      0   4      7 0.4792577
5      1   1      8 0.3903012
6      1   2      7 0.5294948
7      1   3      6 0.6642472
8      1   4      7 0.4792577
9      1   5      2 0.9585154
10     1   6      3 0.9148828

You may notice that 6 and 2 have identical propensity scores because they have identical covariate values, and yet they were not matched to each other. This seems strange, but it has to do with the order in which matches are found when matching without replacement.

By default, matchit performs matching in descending order of the propensity scores for the treated units. Unit 9 has the largest propensity score (.959), so it gets matched first (to unit 3). Unit 10 is next, and it gets matched to unit 2 because unit 3 has already been matched to unit 9 and you are matching without replacement (meaning each control unit can be used only once). Even though units 10 and 2 are very far apart from each other, unit 2 is indeed the closest unit to unit 10 after having used unit 3 already. By the time we get to unit 6, only units 1 and 4 are available, so unit 6 is matched with unit 1.

The point of matching this way is to give those treated units with the highest propensity score the best chance to find a relatively close match since those are likely to be the hardest to find matches for. This strategy doesn't always work, however, and sometimes you get weird matches like the one you found, where two identical units are not matched with each other.

You can change the order of matching by setting m.order = "smallest", which matches in ascending order of the propensity score. You should find that with this option, unit 5 is matched with unit 1, and unit 6 is matched with unit 2. You can also set m.order = "random", which matches in a random order. If you use this option, make sure you set a seed using set.seed() so your results are replicable.

As was mentioned in the comments, you can also perform matching with replacement by setting replace = TRUE. Because control units can now be reused for multiple matches, units 10, 9, and 7 will all be matched to unit 3, and unit 6 will be matched to its twin, unit 2.

You can also set a caliper; this defines the maximum distance for an allowable match. In your original matchit() call, unit 10 and its closest match, unit 3, differ by .25, which is a huge distance, making these units not very similar to each other. You can restrict the allowable matches to be within some distance of each other, measured in standard deviations of the propensity score. If you set a narrow caliper, e.g., caliper = .15, only units that are close to each other will be matched, and any treated unit that doesn't have a match within the caliper will be unmatched. Using a caliper of .15, units 9 and 10 don't receive matches, and the other treated units are matched with their twins in the control group.