0
votes

I am trying to perform a case-control exact matching by age. My database is composed of 139 eyes of 75 patients divided into 2 group by a dichotomy variable (G6PDcarente= 0/1).

I am trying to perform the matching with the code:

match.it <- matchit(G6PDcarente~age, data = newdata, method="exact",ratio=1,replace=FALSE)
match.it

The problem is that the results are:

Exact Subclasses: 14

            Sample sizes:
             Control        Treated
All            43           85
Matched        31           42
Unmatched      12           43

Why is the sample size of the matched pairs so different? Should not it be the same for the control and treat matched sample (eg:31-31)? How can I obtain an exact match on age with the same sample size in the two group?

I have also tried the code:

match.it <- matchit(G6PDcarente~age, data = newdata, method="nearest",exact="age",ratio=1, replace=FALSE)

But I have the following error message:

Error in Ops.data.frame(exact[itert, k], exact[clabels, k]) : 
  ‘!=’ only defined for equally-sized data frames
Inoltre: Warning message:
In matchit2nearest(c(`1` = 0, `2` = 0, `3` = 0, `4` = 0, `5` = 0,  :
  Fewer control than treated units and matching without replacement.  Not all treated units will receive a match.  Treated units will be matched in the order specified by m.order: largest

Can someone help me?

Thanks

Here is the code that reproduces a sample of my data:

newdata <- structure(list(NumeroProgressivo = c(43, 44, 137, 138, 129, 130, 
65, 111, 148, 149, 35, 36, 83, 84, 37, 38, 127, 128, 160, 161, 
75, 76, 53, 54, 119, 120, 109, 110, 57, 58, 39, 51, 52, 29, 30, 
71, 72, 154, 155, 77, 78, 1, 2, 61, 62, 158, 101, 102, 27, 28, 
73, 103, 104, 121, 122, 152, 153, 107, 108, 45, 46, 81, 82, 139, 
140, 59, 60, 95, 96, 33, 34, 91, 92, 26, 49, 50, 79, 6, 63, 64, 
15, 16, 31, 32, 143, 144, 69, 70, 89, 90, 41, 42, 17, 18, 67, 
68, 115, 116, 150, 151, 97, 98, 93, 94, 135, 136, 55, 56, 131, 
132, 162, 163, 21, 22, 23, 24, 156, 157, 133, 166, 174, 175, 
164, 165, 172, 173, 176, 177), IDpaziente = c(22, 22, 67, 67, 
63, 63, 33, 56, 73, 73, 18, 18, 42, 42, 19, 19, 62, 62, 79, 79, 
38, 38, 27, 27, 60, 60, 55, 55, 29, 29, 20, 26, 26, 15, 15, 36, 
36, 76, 76, 39, 39, 1, 1, 31, 31, 78, 51, 51, 14, 14, 37, 52, 
52, 61, 61, 75, 75, 54, 54, 23, 23, 41, 41, 68, 68, 30, 30, 48, 
48, 17, 17, 46, 46, 13, 25, 25, 40, 3, 32, 32, 8, 8, 16, 16, 
70, 70, 35, 35, 45, 45, 21, 21, 9, 9, 34, 34, 58, 58, 74, 74, 
49, 49, 47, 47, 66, 66, 28, 28, 64, 64, 80, 80, 11, 11, 12, 12, 
77, 77, 65, 82, 86, 86, 81, 81, 85, 85, 87, 87), Occhio = c("OD", 
"OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS", "OD", "OS", 
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", 
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD", 
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", 
"OD", "OD", "OS", "OD", "OS", "OD", "OD", "OS", "OD", "OS", "OD", 
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", 
"OD", "OS", "OD", "OS", "OD", "OS", "OS", "OD", "OS", "OD", "OS", 
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", 
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", 
"OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", 
"OS", "OD", "OS", "OD", "OS", "OD", "OS", "OD", "OD", "OD", "OS", 
"OD", "OS", "OD", "OS", "OD", "OS"), G6PDcarente = c(0, 0, 0, 
0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    age = c(70, 70, 38, 38, 54, 54, 41, 74, 31, 31, 27, 27, 36, 
    36, 36, 36, 49, 49, 34, 34, 49, 49, 34, 34, 33, 33, 34, 34, 
    38, 38, 62, 30, 30, 38, 38, 53, 53, 27, 27, 57, 57, 84, 84, 
    25, 25, 26, 57, 57, 47, 47, 29, 31, 31, 26, 26, 23, 23, 34, 
    34, 48, 48, 34, 34, 34, 34, 40, 40, 45, 45, 33, 33, 61, 61, 
    73, 32, 32, 67, 80, 39, 39, 67, 67, 37, 37, 28, 28, 26, 26, 
    32, 32, 24, 24, 61, 61, 36, 36, 66, 66, 26, 26, 35, 35, 39, 
    39, 32, 32, 39, 39, 39, 39, 42, 42, 35, 35, 64, 64, 34, 34, 
    37, 61, 80, 80, 74, 74, 62, 62, 71, 71)), row.names = c(NA, 
-128L), class = c("tbl_df", "tbl", "data.frame"))
2

2 Answers

1
votes

The number of observations assigned to the Control / Treatment groups is exactly what they should be, since the assignment is based on the values in the G6PDcarente variable.

From the help file ?matchit:

(For the first argument in the function, formula) This argument takes the usual syntax of R formula, treat ~ x1 + x2, where treat is a binary treatment indicator and x1 and x2 are the pre-treatment covariates.

In your case, the formula corresponds to G6PDcarente~age, and the number of observations where G6PDcarente == 1 is different from the number where G6PDcarente == 0.

We can verify that directly with a manual inspection, since the dataset is not very large:

library(dplyr)
library(tidyr)

new.data.check <- newdata %>% 
  count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
  spread(G6PDcarente, n) %>%  # create separate columns for G6PDcarente == 0 / == 1
  na.omit()                   # remove NA rows, where a specific age only has G6PDCarente == 0
                              # OR G6PDCarente == 1, but not both (i.e. unmatched samples)

> new.data.check    
# A tibble: 14 x 3
     age   `0`   `1`
   <dbl> <int> <int>
 1    26     3     4
 2    27     2     2
 3    31     2     2
 4    32     2     4
 5    34     6     8
 6    37     1     2
 7    38     2     4
 8    39     2     6
 9    49     2     2
10    61     1     4
11    62     2     1
12    67     2     1
13    74     2     1
14    80     2     1

For age values with both G6PDcarente == 0 and == 1, there are 31 observations for which G6PDcarente == 0 and 42 observations for which G6PDcarente == 1:

> colSums(new.data.check)
age   0   1 
657  31  42 

Not knowing your exact use case, I guess if you really want the same number for treatment vs. control, you can always drop a few observations...

0
votes

Thanks to @Z.Lin reply I have figured out how to resolve my issues.

Here the code I have used following the instruction of this tutorial:

    OCTA.Filtered = as.data.frame(na.omit(OCTA.Filtered)) 
    m.out.test = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, ratio = 1)
    test_data = match.data(m.out.test) 
    ps.sd = sd(test_data$distance)
    # matching is performed below using propensity scores given the covariates mentioned below
    # caliper = 0.25 times sd of propensity scores (optimal)
    m.out = matchit(G6PDcarente~age,method="nearest", data=OCTA.Filtered, caliper = 0.25*ps.sd)
    # check the sample sizes (below)
    m.out 
    # Final matched data saved as final_data
    final_data = match.data(m.out) 
    # (here distance = propensity score)
new.data.check <- final_data %>% 
+   count(age, G6PDcarente) %>% # count all unique combinations of age & G6PDcarente
+   spread(G6PDcarente, n) %>%  # create separate columns for G6PDcarente == 0 / == 1
+   na.omit()
> new.data.check
# A tibble: 14 x 3
     age   `0`   `1`
   <dbl> <int> <int>
 1    26     3     3
 2    27     2     2
 3    31     2     2
 4    32     2     2
 5    34     6     6
 6    37     1     1
 7    38     2     2
 8    39     2     2
 9    49     2     2
10    61     1     1
11    62     1     1
12    67     1     1
13    74     1     1
14    80     1     1