imputation List of data frames: computing quantile with survey package or MIcombine in R?

Question

I am using R and following Anthony Damico(@AnthonyDamico)'s code (http://asdfree.com/survey-of-consumer-finances-scf.html) to compute quantile from Survey of Consumer Finance. The code is the following:

## Load necessary libraries (Note: should be installed in the OS)
library(mitools)    # allows analysis of multiply-imputed survey data
library(survey)     # load survey package (analyzes complex design surveys)
library(downloader) # downloads and then runs the source() function on scripts from github
library(foreign)    # load foreign package (converts data files into R)
library(Hmisc)      # load Hmisc package (loads a simple wtd.quantile function)
library(convey)
library(lodown)
library(devtools)
library(srvyr)

## Read SCF data: wave 2016, path.expand("~") is default working directory  
scf_imp <- readRDS("scf 2016.rds" )
scf_rw <- readRDS("scf 2016 rw.rds" )

# define which variables from the five imputed iterations to keep
vars.to.keep <- c( 'y1' , 'yy1' , 'wgt' , 'one' , 'houses', 'oresre', 'mrthel', 'liq', 'income', 'age' )

# restrict each `scf_imp#` data frame to only those variables
scf_imp[[1]] <- scf_imp[[1]][ , vars.to.keep ]
scf_imp[[2]] <- scf_imp[[2]][ , vars.to.keep ]
scf_imp[[3]] <- scf_imp[[3]][ , vars.to.keep ]
scf_imp[[4]] <- scf_imp[[4]][ , vars.to.keep ]
scf_imp[[5]] <- scf_imp[[5]][ , vars.to.keep ]

# clear up RAM
gc()

# turn off scientific notation in most output
options( scipen = 20 )

scf_design <- 
svrepdesign( 
weights = ~wgt , 
repweights = scf_rw[ , -1 ] , 
data = imputationList( scf_imp ) , 
scale = 1 ,
rscales = rep( 1 / 998 , 999 ) ,
mse = TRUE ,
type = "other" ,
combined.weights = TRUE
)

scf_design <-
update(
scf_design,
wealth = liq + houses + oresre - mrthel,
hwealth = houses + oresre,
htoinc  = (houses + oresre)/income,
ltoinc = mrthel / income,
ltv = mrthel + hwealth,
wtoinc = wealth + income
)

### gini coefficient (whole sample) ###
scf_design$designs <- lapply( scf_design$designs , convey_prep )
giniindex <- scf_MIcombine( with( scf_design , svygini( ~ wealth) ) )

After this code, I tried several methods including 1):

q1.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.01, 
method='constant',interval.type='quantile',se=TRUE)))
q5.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.05, method='constant',interval.type='quantile',se=TRUE)))
q20.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.20, method='constant',interval.type='quantile',se=TRUE)))
q40.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.40, method='constant',interval.type='quantile',se=TRUE)))
q60.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.60, method='constant',interval.type='quantile',se=TRUE)))
q80.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.80, method='constant',interval.type='quantile',se=TRUE)))
q90.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.90, method='constant',interval.type='quantile',se=TRUE)))
q95.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.95, method='constant',interval.type='quantile',se=TRUE)))
q99.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.99, method='constant',interval.type='quantile',se=TRUE)))
maxi.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 1, method='constant',interval.type='quantile',se=TRUE)))

and the following method 2):

quantile.wealth <- scf_MIcombine(with(scf_design,svyquantile(~networth, c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1), method='constant',interval.type='quantile',se=TRUE)))

Method 1) give me number, but it is not exact. First of all, it gives the following warning sign:

Warning messages:
1: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, 1, method = "constant",  :
Not all replicate weight designs give valid standard errors for quantiles.

which says, standard error is not exact... so I guess the mean value is not also trustworthy...(Not sure about this). I tried to use method 2), which is just a little tweak from method 1), using survey packages argument, c[ , ] to compute quantiles in a single command. Then I get the following error.

Error in (1 + 1/m) * evar/vbar : non-conformable arrays
In addition: Warning messages:
1: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8,  :
Not all replicate weight designs give valid standard errors for quantiles.

(ADDITION) Method 3) I used survey package directly refering (https://github.com/cran/survey/blob/master/man/svyquantile.Rd) and tried to compute quantile like the following:

q <- svyquantile(~networth,scf_design, c(.25,.5,.75),ci=TRUE)

But I got the following error.

Error in UseMethod("svyquantile", design) : 
no applicable method for 'svyquantile' applied to an object of class 
 "svyimputationList"

Long story short, I need to compute quantile and Gini coefficient in the right way, so that I can compare the number with the model. Is there any way to correct the code, or use other methods?

Please keep in mind that this is imputation list of data frames, consists 5 data frames. And... notably the data frames does not contain same number of samples(rows).

the code on the asdfree page matches the federal reserve's recommended calculations, so if you stick with the mean/median/quantile calculations shown there, your numbers will be correct. you added method='constant',interval.type='quantile' but not sure why — Anthony Damico
@AnthonyDamico Hi-, that part, I just followed your code specified in here github.com/ajdamico/asdfree/blob/archive/… — user9620331

Javier Javier · Accepted Answer · 2018-05-31T21:28:49

I can see 3 issues. The first one is simple. The variable networth is not included in vars.to.keep so it is not available for use later on and will raise errors.

Second, the warnings are really a non-issue. The location of the warning in the source code is here. This warning is raised whenever you call svyquantile with interval.type='quantile' and the survey design was constructed with type = 'other' or any type that uses jackknife replicate weights. You should not change the type of survey design. The warning is a reminder that standard errors produced with interval.type='quantile' are invalid on jacknife survey designs or unusual bootstrap designs. The SCF dataset uses bootstrap, but it's unclear whether the standard errors are valid. Alternatively, interval.type='probability' doesn't raise errors but maybe it's not what you are looking for.

q1.wealth <- scf_MIcombine(with(scf_design,
    svyquantile(~wealth,0.01,
                method='constant',
                interval.type='probability',
               se=TRUE, na.rm=T)));

Third, the last error is nasty. I'm referring to this

> quantile.wealth <- scf_MIcombine(with(
      scf_design,
      svyquantile(~networth,
                  c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1),
                  method='constant',
                  interval.type='quantile',
                  se=T,na.rm=T)));
# Error in (1 + 1/m) * evar/vbar : non-conformable arrays

It seems that scf_MIcombine and MIcombine fail to combine the results of a svyquantile call with multiple quantiles. There is a workaround, the function svyby is not affected by this bug.

scf_MIcombine( with(scf_design,
    svyby(~networth,~as.factor(1),
          svyquantile,
          c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1),
          se=T, na.rm=T,
          method='constant',
          interval.type='quantile')))

In the above example, the function svyby is used to compute a statistic on subsets of survey data. In this case I used a fake subset ~as.factor(1) that includes all the sample.

imputation List of data frames: computing quantile with survey package or MIcombine in R?

1 Answers