Sample from a vector of correlations and apply it to generate correlated data

Question

I want to generate three correlated outcomes for 20 studies. Each study has 3 groups (control, treat1, and treat2). For the control group, my generating values are: mean=0, sd=1; for both treatment groups, my generating values are: mean=0.40, sd=1. Two things that I want to accomplish (which I’m having trouble doing):

1) Condition 1: I want to generate correlated outcome so that there are different correlations between each of the pairs of outcomes. The correlation should be sampled from the vector of correlations, rho=c(0.6, 0.7, 0.8); and

2) Condition 2: I want to generate correlated outcomes so that a subset of the studies (half) will be sample from a vector of correlations, rho1=c(0.6, 0.7, 0.8), and the other subset(remaining half) will be sampled from a vector of correlations, rho2=c(0.3, 0.4, 0.5)

I’m using the “mvtnorm” package to generate the outcomes for each of the groups. Here’s my code (please pardon my very basic knowledge of simulation and R):

 library(“mvtnorm”)
 set.seed(0307)
 mean_c = c(0, 0, 0)
 mean_t1 = c(0.4, 0.4, 0.4)
 mean_t2 = c(0.4. 0.4, 0.4)
 k <- 20    # no. of studies
 n <- 50    # sample size

 rho <-     # the value is sampled from a vector of correlations 
 for (i in 1:k) {
   Yc <-rmvnorm(n=n, mean=mean_c, sigma=rho)
   Yt1<-rmvnorm(n=n, mean=mean_t1, sigma=rho)
   Yt2 <-rmvnorm(n=n, mean=mean_t2, sigma=rho)
 }

I appreciate any inputs from our programming experts here. Thanks!

Roc Roc · Accepted Answer · 2015-02-28T23:06:21

I am not sure I have understood your question.

But just in case it might help you, here I provide an example of rmvnorm function using your "data". I modified some numbers in order to make clear all dependencies

library(mvtnorm)
set.seed(1234)

k = 10000
means = c(0, 0.4, 0.4)
sigmas = c(2, 1, 1)
rhoXY = 0.6
rhoXZ = 0.7
rhoYZ = 0.8
varMatrix <- matrix(c(
    sigmas[1]*sigmas[1], rhoXY*sigmas[1]*sigmas[2], rhoXZ*sigmas[1]*sigmas[3],
    rhoXY*sigmas[1]*sigmas[2], sigmas[2]*sigmas[2], rhoYZ*sigmas[2]*sigmas[3],
    rhoXZ*sigmas[1]*sigmas[3], rhoYZ*sigmas[2]*sigmas[3], sigmas[3]*sigmas[3]
    ), 
    ncol=3, byrow=TRUE)

# Generate data
Yc <- rmvnorm(n = k, 
             mean = means, 
             sigma = varMatrix, method="chol")


# Check data satisfies what it should
colMeans(Yc)
var(Yc)
cor(Yc[,1], Yc[,2])
cor(Yc[,1], Yc[,3])
cor(Yc[,2], Yc[,3])

Check output

> colMeans(Yc)
[1] 0.007118385 0.406214538 0.401605464
> var(Yc)
         [,1]      [,2]      [,3]
[1,] 4.024896 1.2026685 1.4204561
[2,] 1.202668 0.9998153 0.8046641
[3,] 1.420456 0.8046641 1.0052659
> cor(Yc[,1], Yc[,2])
[1] 0.599527
> cor(Yc[,1], Yc[,3])
[1] 0.7061712
> cor(Yc[,2], Yc[,3])
[1] 0.802628

Sample from a vector of correlations and apply it to generate correlated data

2 Answers