0
votes

I am new to R and am trying create a new dataframe of bootstrapped resamples of groups of different sizes. My dataframe has 6 variables and a group designation, and there are 128 groups of different Ns. Here is an example of my data:

    head(PhenoM2)
        ID       Name PhenoNames Group   HML    RML    FML    TML   FHD   BIB
   1 378607 PaleoAleut PaleoAleut     1 323.5 248.75 434.50 355.75 46.84    NA
   2 378664 PaleoAleut PaleoAleut     1    NA 238.50 441.50 353.00 45.83 277.0
   3 378377 PaleoAleut PaleoAleut     1 309.5 227.75 419.00 332.25 46.39 284.0
   4 378463 PaleoAleut PaleoAleut     1 283.5 228.75 397.75 331.00 44.37 255.5
   5 378602 PaleoAleut PaleoAleut     1 279.5 230.00 393.00 329.50 45.93 265.0
   6 378610 PaleoAleut PaleoAleut     1 307.5 234.25 419.50 338.50 43.98 271.5

Pulling from this question - bootstrap resampling for hierarchical/multilevel data - and taking some advice from others (thanks!) I wrote the code:

    resample.M <- NULL
    for(i in 1000){
    groups <- unique(PhenoM2$"Group")

    for(ii in 1:128)
    data.i.ii <- PhenoM2[PhenoM2$"Group"==groups[ii],]
    resample.M[i] <- data.i.ii[sample(1:nrow(data.i.ii),replace=T),]
    }

Unfortunately, this gives me the warning:

  In resample.M[i] <- data.i.ii[sample(1:nrow(data.i.ii), replace = T),:
        number of items to replace is not a multiple of replacement length

Which I understand, since each of the 128 groups has a different N and none of it is a multiple of 1000. I put in resample.M[i] to try and accumulate all of the 1000x resamples of the 128 groups into a single database, and I'm pretty sure the problem is here.

Nearly all of the examples of for loops I've read create a vector database - numeric(1000) - then plug in the information, but since I'm wanting all of the data (which include factors, integers, and numerics) this doesn't work. I tried making a matrix to put the info in (there are 2187 unique individuals in the dataframe):

    resample.M <- matrix(ncol=2187000,nrow=10)

But it's giving me the same warning.

So, since I'm sure I'm missing something basic here, I have three questions:

How can I get this code to resample all of the groups (with replacement and based on their individual Ns)?

How can I get this code to repeat this resampling 1000x?

How can I get the resamples of every group into the same database?

Thank you so much for your insight and expertise!

1
To address the warnings you get. By typing resample.M[i] you are accessing the i-th element. Row access is done by resample.M[i, ] and column access resample.M[, i].Brouwer

1 Answers

0
votes

I think you may have wanted to use double square bracket, to store the results in a list, i.e. resample.M[[i]] <- .... Apart from that it makes more sense to write PhenoM2$Group than PhenoM2$"Group" and also groups <- unique(PhenoM2$Group) can go outside of your for loop since you only need to compute it once. Also replace 1:128 by 1:length(groups) or seq_along(groups), so that you don't need to hard code the length of the vector.

Because you will often need to operate on data frames grouped by some variable, I suggest you familiarise yourself with a package designed to do that, rather than using for loops, which can be very slow. The best one for a beginner in R may be plyr, which has an easy syntax (although there are many possibilities, including the slightly more "advanced" packages like dplyr and data.table).

So for a subset d <- subset(PhenoM2, Group == 1), you already have the function you need to perform on it: function(d) d[sample(1:nrow(d), replace = TRUE),].

Now to go over all such subsets, perform this operation and then arrange the results in a new data frame named samples you do

samples <- ddply(PhenoM2, .(Group),
    function(d) d[sample(1:nrow(d), replace = TRUE),])

So what remains is to iterate this 1000 or however many times you want. You can use a for loop for this, storing the results in a list. Note that you need to use double square bracket [[ to set elements of the list.

n <- 1000 # number of iterations
samples <- vector("list", n) # list of length n to store results
for (i in seq_along(samples)) 
    samples[[i]] <- ddply(PhenoM2, .(Group),
        function(d) d[sample(1:nrow(d), replace = TRUE),])

An alternative way would be to use the function replicate, that performs the same task many times.

Once you have done this, all resamples will be stored in a list. I am not sure what you mean by "How can I get the resamples of every group into the same database". If you want to group them in a single data frame, you do all.samples <- do.call(rbind, samples). In general, you can format your list of samples using do.call and lapply together with a function.