0
votes

I used the MatchIt function to derive a 1:4 ratio treated:untreated dataset, attempting to achieve similar average age and gender frequency.

I have a small treated group (n = 44) and a much larger control group (n= 980). To reduce the number of the control group and exclude age and gender as confounders, I attempted to use the MatchIt function to create a control group of 176 with an average age and gender balance similar to the treated group.

m.out <- matchit(Treated ~ AGE + SEX, data = d, 
                 method = "optimal",
                 ratio = 4)

The summary of the output is:

Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff eQQ Med
distance        0.0602        0.0603     0.0250   -0.0001       0
AGE            57.5227       58.4034     7.9385   -0.8807       1
SEXF            0.4318        0.1477     0.3558    0.2841       0
SEXM            0.5682        0.8523     0.3558   -0.2841       0

The Age variable worked great - it is not significantly different but the gender seemed off (85% male in control vs 57% in treated) so I performed a chi-square test on the treated ~ gender data. It showed a highly significant difference in gender:

chisq <- with(m.data, chisq.test(SEX, Treated))
data:  SEX and Treated
X-squared = 15.758, df = 1, p-value = 7.199e-05

How do I account for the difference here? Is my problem with the MatchIT function (incorrect method?) or it has worked but I've applied the chi-square to the incorrect problem?

1
Where did you find that function? Please share any and all packages you are usingSotos
I installed MatchIt from within Rstudio This is a help article for the package: imai.fas.harvard.edu/research/files/matchit.pdfKeno

1 Answers

0
votes

There are many reasons why propensity score matching didn't "work" in this case. In general, it isn't guaranteed to balance covariates in small samples; the theoretical properties of the propensity score apply in large samples and with the correct propensity score (and yours is almost certainly not correct).

Some more specific reasons could be that when doing 4:1 matching, so many controls units that are far from treated units are matched to your treated units. You could see if matching fewer control units fixes this by changing the ratio. It could be that optimal matching is not a good matching method to use. Optimal matching finds optimal pairs based on the propensity score, but you want balance on the covariates, not the propensity score. You could try genetic matching (i.e., using method = "genetic"), though this will probably fail as well (it's like using a hammer on a thumb-tack).

One recommendation is to use the designmatch package to perform cardinality matching, which allows you to impose balance constraints and perform the matching without having to estimate a propensity score. With only two covariates, though, exact matching on gender and nearest-neighbor matching on age should do a fairly good job. Set exact = d$gender and distance = d$age in matchit() and see if that works better. You don't need a propensity score for this problem.

Finally, don't use hypothesis tests to assess balance. The balance output is enough. DOn't stop trying to find good matches until your balance can't improve any more. See Ho, Imai, King, & Stuart (2007) for more information on this. They are the authors of MatchIt too.


Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013