3
votes

I'm trying to simulate some data for sample size estimation and my loop is returning unexpected results.

I'm trying to sample a from a vector of generated values with varying numbers of sample sizes and then concatenate means and standard deviations for a number of simulations.

library(MCMCglmm)
library(tidyverse)

Est <- function(n, mean, sd, lower, upper, samp_min, samp_max, samp_int, nsim){

        Data <- round(rtnorm(n, mean, sd, lower, upper), digits = 0) # Create a vector to sample from

        Samp_size <- seq(samp_min, samp_max, samp_int) # Create vector of sample sizes

        # Set up enpty results data frames
        Results_samp <- data.frame()
        Results <- data.frame()

        for(i in 1:nsim){ ## Loop through number of simulations

        for (j in seq_along(Samp_size)) { # Loop through sample sizes
                Score <- sample(Data, j, replace = TRUE)
                Nsubj <- Samp_size[j]
                Mean <- mean(Score, na.rm = TRUE)
                SD <- sd(Score, na.rm = TRUE)
                Results_samp <- rbind(Results_samp, 
                                      data.frame(
                                              Nsubj,
                                              Mean,
                                              SD))
        }
        Results <- rbind(Results, Results_samp)   
        }

        Results
}

Test <- Est(n = 1000, mean = 55, sd = 37, lower = 0, upper = 100, 
            samp_min = 5, samp_max = 20, samp_int = 5, nsim = 5)

This creates a data frame with 60 rows, where I'm expecting 20 (5 simulations of 4 sample sizes) and I always get NA returned for the sample size of 5.

Can anyone see where I'm going wrong?

Thanks!

1
You need to "reset" Results_samp in your inner for loop. I've given an example below.Maurits Evers

1 Answers

2
votes

Generally, dynamically growing a data.frame with rbind is a very inefficient way of doing things in R. There are almost always better/faster ways of doing what you're trying to do.

That aside, in terms of answering your question, let's take a look at a simplified version of your nested for loop

x1 <- data.frame()
x2 <- data.frame()
for (i in 1:5) {
    for (j in 1:4) x1 <- rbind(x1, data.frame(x1 = i, x2 = i^2))
    x2 <- rbind(x2, x1)
}

See how x2 has 60 rows?

The reason for that is that you never reset x1. If we fix that

x1 <- data.frame()
x2 <- data.frame()
for (i in 1:5) {
    for (j in 1:4) x1 <- rbind(x1, data.frame(x1 = i, x2 = i^2))
    x2 <- rbind(x2, x1)
    x1 <- data.frame()
}

we have nrow(x2) = 20, as expected.