I am using R and following Anthony Damico(@AnthonyDamico)'s code (http://asdfree.com/survey-of-consumer-finances-scf.html) to compute quantile from Survey of Consumer Finance. The code is the following:
## Load necessary libraries (Note: should be installed in the OS)
library(mitools) # allows analysis of multiply-imputed survey data
library(survey) # load survey package (analyzes complex design surveys)
library(downloader) # downloads and then runs the source() function on scripts from github
library(foreign) # load foreign package (converts data files into R)
library(Hmisc) # load Hmisc package (loads a simple wtd.quantile function)
library(convey)
library(lodown)
library(devtools)
library(srvyr)
## Read SCF data: wave 2016, path.expand("~") is default working directory
scf_imp <- readRDS("scf 2016.rds" )
scf_rw <- readRDS("scf 2016 rw.rds" )
# define which variables from the five imputed iterations to keep
vars.to.keep <- c( 'y1' , 'yy1' , 'wgt' , 'one' , 'houses', 'oresre', 'mrthel', 'liq', 'income', 'age' )
# restrict each `scf_imp#` data frame to only those variables
scf_imp[[1]] <- scf_imp[[1]][ , vars.to.keep ]
scf_imp[[2]] <- scf_imp[[2]][ , vars.to.keep ]
scf_imp[[3]] <- scf_imp[[3]][ , vars.to.keep ]
scf_imp[[4]] <- scf_imp[[4]][ , vars.to.keep ]
scf_imp[[5]] <- scf_imp[[5]][ , vars.to.keep ]
# clear up RAM
gc()
# turn off scientific notation in most output
options( scipen = 20 )
scf_design <-
svrepdesign(
weights = ~wgt ,
repweights = scf_rw[ , -1 ] ,
data = imputationList( scf_imp ) ,
scale = 1 ,
rscales = rep( 1 / 998 , 999 ) ,
mse = TRUE ,
type = "other" ,
combined.weights = TRUE
)
scf_design <-
update(
scf_design,
wealth = liq + houses + oresre - mrthel,
hwealth = houses + oresre,
htoinc = (houses + oresre)/income,
ltoinc = mrthel / income,
ltv = mrthel + hwealth,
wtoinc = wealth + income
)
### gini coefficient (whole sample) ###
scf_design$designs <- lapply( scf_design$designs , convey_prep )
giniindex <- scf_MIcombine( with( scf_design , svygini( ~ wealth) ) )
After this code, I tried several methods including 1):
q1.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.01,
method='constant',interval.type='quantile',se=TRUE)))
q5.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.05, method='constant',interval.type='quantile',se=TRUE)))
q20.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.20, method='constant',interval.type='quantile',se=TRUE)))
q40.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.40, method='constant',interval.type='quantile',se=TRUE)))
q60.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.60, method='constant',interval.type='quantile',se=TRUE)))
q80.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.80, method='constant',interval.type='quantile',se=TRUE)))
q90.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.90, method='constant',interval.type='quantile',se=TRUE)))
q95.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.95, method='constant',interval.type='quantile',se=TRUE)))
q99.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 0.99, method='constant',interval.type='quantile',se=TRUE)))
maxi.wealth <- scf_MIcombine(with(scf_design,svyquantile(~wealth, 1, method='constant',interval.type='quantile',se=TRUE)))
and the following method 2):
quantile.wealth <- scf_MIcombine(with(scf_design,svyquantile(~networth, c(0.01,0.2,0.4,0.6,0.8,0.9,0.99,1), method='constant',interval.type='quantile',se=TRUE)))
Method 1) give me number, but it is not exact. First of all, it gives the following warning sign:
Warning messages:
1: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, 1, method = "constant", :
Not all replicate weight designs give valid standard errors for quantiles.
which says, standard error is not exact... so I guess the mean value is not also trustworthy...(Not sure about this). I tried to use method 2), which is just a little tweak from method 1), using survey packages argument, c[ , ] to compute quantiles in a single command. Then I get the following error.
Error in (1 + 1/m) * evar/vbar : non-conformable arrays
In addition: Warning messages:
1: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
2: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
3: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
4: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
5: In svyquantile.svyrep.design(~networth, c(0.01, 0.2, 0.4, 0.6, 0.8, :
Not all replicate weight designs give valid standard errors for quantiles.
(ADDITION) Method 3) I used survey package directly refering (https://github.com/cran/survey/blob/master/man/svyquantile.Rd) and tried to compute quantile like the following:
q <- svyquantile(~networth,scf_design, c(.25,.5,.75),ci=TRUE)
But I got the following error.
Error in UseMethod("svyquantile", design) :
no applicable method for 'svyquantile' applied to an object of class
"svyimputationList"
Long story short, I need to compute quantile and Gini coefficient in the right way, so that I can compare the number with the model. Is there any way to correct the code, or use other methods?
Please keep in mind that this is imputation list of data frames, consists 5 data frames. And... notably the data frames does not contain same number of samples(rows).
method='constant',interval.type='quantile'
but not sure why – Anthony Damico