0
votes

I am using a multiply imputed dataset with Amelia and would then like Zelig to calculate predicted values from a regression model. Zelig's documentation states that "When quantities of interest are plotted, such as expected and predicted values and first differenences, these are correctly pooled across those from each of the m imputed datasets". This is true, but I would also like to obtain estimated values pooled across each of the imputed datasets as the output of the "sim" command.

Here is sample code replicating the instructions on the Zelig webiste and generating the same output:

library("Amelia")
data(africa)
a.out <- amelia(x = africa, m=5, cs = "country", ts = "year", logs = "gdp_pc")
z.out <- zelig(gdp_pc ~ trade + civlib, model = "ls", data = a.out)
summary(z.out)

I then use "setx" to estimate the predicted values of the DV (gdp_pc) when "trade" is set at values of 50 and 100.

x.out <- setx (z.out, trade = c(50,100))
x.out
range:
  (Intercept) trade civlib
1           1    50  0.289
2           1   100  0.289

Next step: Use 'sim' method

If I then use "sim" and "plot", R generates a plot with the estimates I requested:

s.out <- sim (z.out, x = x.out)
plot(s.out)

However, I would like to have a printout of the predicted values and their standard errors and values at different confidence intervals pooled across all the imputed datasets according to the Rubin rule. This is not what the "summary" command seems to be doing:

summary(s.out)
[1] 50


 sim range :
 -----
ev
     mean     sd      50%     2.5%   97.5%
1 844.843 30.567 845.1218 791.8107 908.658
pv
         mean       sd      50%     2.5%    97.5%
[1,] 857.6479 372.9689 852.9239 157.7842 1553.552

 sim range :
 -----
ev
      mean       sd      50%     2.5%    97.5%
1 836.2505 36.72892 833.3876 770.7931 908.7371
pv
         mean      sd      50%     2.5%    97.5%
[1,] 821.3542 359.461 790.5742 204.7687 1483.275

 sim range :
 -----
ev
     mean       sd      50%     2.5%    97.5%
1 837.307 34.99979 839.4895 765.0043 896.1513
pv
         mean       sd      50%     2.5%    97.5%
[1,] 831.6275 347.4005 844.0667 120.8968 1526.509

 sim range :
 -----
ev
      mean       sd      50%     2.5%    97.5%
1 838.1396 33.49521 837.6317 776.3413 901.4235
pv
         mean       sd      50%     2.5%    97.5%
[1,] 866.5946 364.2909 830.9851 263.8757 1594.664

 sim range :
 -----
ev
     mean       sd      50%     2.5%    97.5%
1 842.784 35.18827 843.5563 779.9052 914.5869
pv
         mean       sd      50%     2.5%    97.5%
[1,] 834.7425 350.5647 834.0003 228.0261 1527.293


[1] 100


 sim range :
 -----
ev
      mean       sd      50%    2.5%    97.5%
1 1743.969 54.06692 1742.795 1627.39 1840.744
pv
        mean       sd      50%     2.5%    97.5%
[1,] 1700.53 350.1268 1718.504 1047.998 2322.216

 sim range :
 -----
ev
      mean       sd      50%     2.5%    97.5%
1 1748.554 58.46152 1755.443 1634.345 1854.652
pv
         mean       sd      50%     2.5%    97.5%
[1,] 1734.831 340.8356 1734.907 1071.973 2347.156

 sim range :
 -----
ev
      mean       sd      50%     2.5%    97.5%
1 1741.014 63.86164 1741.492 1615.497 1863.306
pv
         mean       sd      50%   2.5%    97.5%
[1,] 1759.305 329.6513 1746.153 1172.5 2435.067

 sim range :
 -----
ev
      mean       sd      50%     2.5%    97.5%
1 1738.422 64.75221 1738.474 1615.078 1854.675
pv
         mean       sd      50%     2.5%    97.5%
[1,] 1728.152 386.8327 1761.047 849.7188 2395.825

 sim range :
 -----
ev
      mean       sd      50%     2.5%    97.5%
1 1746.575 53.02558 1744.919 1638.602 1848.114
pv
         mean       sd      50%    2.5%    97.5%
[1,] 1710.864 342.1865 1702.769 1050.85 2288.021

Here, I get all the values for each of the imputed datasets, instead of the values pooled across all multiply imputed datasets. Is there a way to get Zelig to apply the Rubin rule to the multiply imputed datasets when providing summary statistics of the predicted estimates, as well as when drawing charts based on them?

Note: the application I need would require negative binomial regression, not linear regression, to be the model used in Zelig. I have used this example to replicate the example provided by the Zelig developers.

Many thanks for your help, and have a lovely day!

1

1 Answers

1
votes

You don't need to use Rubin's rules in this case, since the uncertainty is calculated from the variance in the simulations. I'm a bit surprised that Zelig doesn't average these for you, but you can do it yourself without too much difficulty:

qi.out <- zelig_qi_to_df(s.out)

lapply(split(qi.out, qi.out["trade"]),
       function(x) c(trade = unique(x$trade),
                     mean = mean(x$expected_value),
                     sd = sd(x$expected_value),
                     median = median(x$expected_value),
                     quantile(x$expected_value, probs = c(0.5, 0.025, 0.975))))

lapply(split(qi.out, qi.out["trade"]),
       function(x) c(trade = unique(x$trade),
                     mean = mean(x$predicted_value),
                     sd = sd(x$predicted_value),
                     median = median(x$predicted_value),
                     quantile(x$predicted_value, probs = c(0.5, 0.025, 0.975))))