3
votes

I am running a simple genetic drift simulation in R.

# Population size
N<-5000
# Number with focal allele
X1<-(N/2)
# Number of generations
ngens<-(2000)
# Number of replicates
nreps<-10

# Drift function
drift <- function(N, X1, ngens, nreps) {
# Makes a matrix of NA's of nreps columns, and ngen rows
       p <- matrix(NA, nrow=ngens, ncol=nreps)
  # Set base population
       p[1,] <- X1/N
  # Repetitive sampling function, each generation sample 10 times from the generation before (gen-1)  
       for(gen in 2:ngens)
         p[gen,] <- rbinom(n=nreps, size=N, prob=p[gen-1,]) / N
       p
}
# Run function "drift" & output as data frame
p <- data.frame(drift(N, X1, ngens, nreps))
# Plot
matplot(p, type="l", ylim=c(0,1), lty=1, xlab="Generations", ylab="Proportion Focal",col="grey")
# Mean value
p$mean<-apply(p[,c(1:10)],1,mean)
matplot(p$mean, type="l", ylim=c(0,1), lty=1, xlab="Generations", ylab="Proportion Focal",col="black",add=T)

I would like to:

  1. Calculate confidence intervals (by bootstrapping) around the mean for each generation
  2. Add a pair of columns to my dataframe with the upper and lower confidence interval estimates which can then be plotted on the matplot just like I have done with the mean value

Can anyone suggest a way to do this? I am aware I need the Boot package and know roughly how to use this but guidance would be good.

The problem (for me) is getting a loop which generates a CI for each generation of the simulation and pasting that to the "p" dataframe

EDIT:

I have tried this as partly suggested by @bakyaw and the "for" loop adapted from an old script I used once.

myBootFunction<-function(x){
  b <- boot(x, function(u,i) mean(u[i]), R = 999)
  boot.ci(b, type = c("norm"))
}

meanList<-apply(p[c(2:ngens),c(1:nreps)],1,function(x)myBootFunction(x))
for(i in 1:49) {
  low<-meanList[[i]][[4]][[2]]
  high<-meanList[[i]][[4]][[3]]
  CIMatrix<-cbind(high,low)}

Note the addition of c(2:ngens), without out it this error comes up.

[1] "All values of t are equal to 0.5 \n Cannot calculate confidence intervals"

However, this still only creates the CIMatrix as a 1x2 double matrix, rather than one with every generation in it.

1
Does this help?bakyaw
With the bootstrapping yes, but not actually implementing it as I described - i.e. a loop which does one for each row (1 row is created in my df per generation of the simulation) and then pastes the upper and lower estimates as columnsrg255
ok here is a rough idea. write your bootstrap code into a function. Then call that function within a sapply statement. such as last_output<-sapply(1:nrow(p),function(x)yourBootFunction(x)) So that you can get the row n and boot it. This will return a list with the confidence intervals of each row, at the length of row count of p. does it help?bakyaw
when i reproduced your dataframe p, the first row was 0.5 for all the columns. That is the warning for the first line. The rest should be calculated (at least for me it worked). As all the values are the same, there is no confidence interval for the first row.bakyaw

1 Answers

0
votes

This can stay as the answer if you have solved your problem. Boot function:

    library(boot)

    myBootFunction<-function(x){
        b <- boot(x, function(u,i) mean(u[i]), R = 999)
        boot.ci(b, type = c("norm", "basic", "perc"))
    }

then instead of this line in your code: p$mean<-apply(p[,c(1:10)],1,mean)

you can use :

meanList<-apply(p[,c(1:10)],1,function(x)myBootFunction(x))

After having the list with the confidence intervals you may convert it to a data frame and then work on it.