1
votes

I am trying to calculate the confidence interval in R. Due to some special reasons, I have to do it with the functions in "bootstrap" package.(which means I can't use the functions in "boot" package.)

Here is my code.

And what I am doing is trying to calculate the Pearson correlation coefficient, and then apply the Bootstrap method (with B = 100) to obtain the estimate of the correlation coefficient. But I don't know how to construct the 95% confidence intervals.

library(bootstrap) 
data('law')

set.seed(1)
theta <- function(ind) {
  cor(law[ind, 1], law[ind, 2], method = "pearson")
  }
law.boot <- bootstrap(1:15, 100, theta) 
# sd(law$thetastar)
percent.95 <- function(x) {
  quantile(x,  .95)
  }
law.percent.95 <- bootstrap(1:15, 100, theta, func=percent.95)

Sorry if I didn't make myself clear or tag the wrong tags. Sorry twice for not producing a dataset (now it's provided) and thank professor Roland for point it out. Thanks very much!

2
"which is a matrix that has 2 lists." That is a very unusual data structure. Therefore, you need to provide a reproducible example (see this FAQ) or at least the output of str(CD).Roland

2 Answers

1
votes

There are different ways of calculating the CI for the bootstrap estimator (cf. to this Wikipedia article for instance.

The easiest is to tale the 2.5% and 97.5% quantiles from the bootstrapped coefficients (Percentile Bootstrap in the Wikipedia article):

quantile(law.boot$thetastar, c(0.025, 0.975))
#      2.5%     97.5% 
# 0.4528745 0.9454483 

Basic Bootstrap would be calculated as

2 * mean(law.boot$thetastar) - quantile(law.boot$thetastar, c(0.975, 0.025))
#     97.5%      2.5% 
# 0.5567887 1.0493625
1
votes

You could do this by hand.

library(bootstrap) 
data('law')
names(law) <- tolower(names(law))

set.seed(1)
theta <- function(ind) cor(law[ind, 1], law[ind, 2], method = "pearson")
law.boot <- bootstrap(1:15, 1000, theta) 

ci1 <- qnorm(p=c(.025, .975), 
            mean=mean(law.boot$thetastar), 
            sd=sd(law.boot$thetastar))

Gives:

> ci1
[1] 0.5055894 1.0268053

Compared to bootstrap from the scratch:

set.seed(1)
FX <- function(x) with(x, cor(lsat, gpa))
boot <- replicate(1000, FX(law[sample(nrow(law), round(nrow(law)), 
                                     replace=TRUE), ]))

ci2 <- qnorm(p=c(.025, .975), 
            mean=mean(boot), 
            sd=sd(boot))

Gives:

> ci2
[1] 0.5065656 1.0298412

Thus ci1 and ci2 seem to be similar.

However, note: I've adapted the bootstrap to 1,000 repetitions. With just 100 repetitions the difference is naturally somewhat larger.

Note 2: My answer considers CIs as asked. However, probably it's more appropriate to use the percentiles. See the thothal's answer how to obtain them.