3
votes

I want to generate three correlated outcomes for 20 studies. Each study has 3 groups (control, treat1, and treat2). For the control group, my generating values are: mean=0, sd=1; for both treatment groups, my generating values are: mean=0.40, sd=1. Two things that I want to accomplish (which I’m having trouble doing):

1) Condition 1: I want to generate correlated outcome so that there are different correlations between each of the pairs of outcomes. The correlation should be sampled from the vector of correlations, rho=c(0.6, 0.7, 0.8); and

2) Condition 2: I want to generate correlated outcomes so that a subset of the studies (half) will be sample from a vector of correlations, rho1=c(0.6, 0.7, 0.8), and the other subset(remaining half) will be sampled from a vector of correlations, rho2=c(0.3, 0.4, 0.5)

I’m using the “mvtnorm” package to generate the outcomes for each of the groups. Here’s my code (please pardon my very basic knowledge of simulation and R):

 library(“mvtnorm”)
 set.seed(0307)
 mean_c = c(0, 0, 0)
 mean_t1 = c(0.4, 0.4, 0.4)
 mean_t2 = c(0.4. 0.4, 0.4)
 k <- 20    # no. of studies
 n <- 50    # sample size

 rho <-     # the value is sampled from a vector of correlations 
 for (i in 1:k) {
   Yc <-rmvnorm(n=n, mean=mean_c, sigma=rho)
   Yt1<-rmvnorm(n=n, mean=mean_t1, sigma=rho)
   Yt2 <-rmvnorm(n=n, mean=mean_t2, sigma=rho)
 }

I appreciate any inputs from our programming experts here. Thanks!

2

2 Answers

1
votes

I am not sure I have understood your question.

But just in case it might help you, here I provide an example of rmvnorm function using your "data". I modified some numbers in order to make clear all dependencies

library(mvtnorm)
set.seed(1234)

k = 10000
means = c(0, 0.4, 0.4)
sigmas = c(2, 1, 1)
rhoXY = 0.6
rhoXZ = 0.7
rhoYZ = 0.8
varMatrix <- matrix(c(
    sigmas[1]*sigmas[1], rhoXY*sigmas[1]*sigmas[2], rhoXZ*sigmas[1]*sigmas[3],
    rhoXY*sigmas[1]*sigmas[2], sigmas[2]*sigmas[2], rhoYZ*sigmas[2]*sigmas[3],
    rhoXZ*sigmas[1]*sigmas[3], rhoYZ*sigmas[2]*sigmas[3], sigmas[3]*sigmas[3]
    ), 
    ncol=3, byrow=TRUE)

# Generate data
Yc <- rmvnorm(n = k, 
             mean = means, 
             sigma = varMatrix, method="chol")


# Check data satisfies what it should
colMeans(Yc)
var(Yc)
cor(Yc[,1], Yc[,2])
cor(Yc[,1], Yc[,3])
cor(Yc[,2], Yc[,3])

Check output

> colMeans(Yc)
[1] 0.007118385 0.406214538 0.401605464
> var(Yc)
         [,1]      [,2]      [,3]
[1,] 4.024896 1.2026685 1.4204561
[2,] 1.202668 0.9998153 0.8046641
[3,] 1.420456 0.8046641 1.0052659
> cor(Yc[,1], Yc[,2])
[1] 0.599527
> cor(Yc[,1], Yc[,3])
[1] 0.7061712
> cor(Yc[,2], Yc[,3])
[1] 0.802628
0
votes

Thanks for the email, it was nice to be asked! I don't totally understand the rmvnorm function (or your request!) but it looks like Roc answered your question. Nevertheless it is simple to perform the function 20 times, using different rho values in the two halves. My code is perhaps not the most elegant - it might be possible to generate all this data with a single call to rmvnorm, rather than 20 as in my code, but this seems to work just fine. You can access the results for your 20 studies as I have done with the square brackets.

library(mvtnorm)
set.seed(1234)


k = 10000
means = c(0, 0.4, 0.4)
sigmas = c(1, 1, 1)
rho.type1 <- c(0.3, 0.4, 0.5)
rho.type2 <- c(0.6, 0.7, 0.8)
study.number <-20
Yc <- matrix(0, ncol = 3, nrow = k* study.number)

for(i in 1: study.number)
{
  ifelse(i < 11, rho <- rho.type1, rho <- rho.type2)
  varMatrix <- matrix(c(
      sigmas[1]*sigmas[1], rho[1]*sigmas[1]*sigmas[2], rho[2]*sigmas[1]*sigmas[3],
      rho[1]*sigmas[1]*sigmas[2], sigmas[2]*sigmas[2], rho[3]*sigmas[2]*sigmas[3],
      rho[2]*sigmas[1]*sigmas[3], rho[3]*sigmas[2]*sigmas[3], sigmas[3]*sigmas[3]
      ), 
      ncol=3, byrow=TRUE)

# Generate data, and save the 20 datasets in a list called Yc
  Yc[(1 + (i-1)*k):(i*k), ] <- rmvnorm(n = k, 
                mean = means, 
                sigma = varMatrix, method="chol")
}
Yc <- data.frame(Yc, study = rep(1:20, each = k))

# Check output 
cor(Yc[Yc$study==1,1], Yc[Yc$study==1,2]) # To check the first entry in the list
for(i in 1:20) print(cor(Yc[Yc$study==i,1], Yc[Yc$study==i,2])) # To check the lot