4
votes

I want to simulate 100 data with 5 columns. I want to get a correlation of 0.5 between the columns. To complete it, I have done the following action

F1 <- matrix( c(1, .5, .5, .5,.5,
                   .5, 1, .5, .5,.5,
                   .5, .5, 1, .5,.5,
                   .5, .5, .5, 1,.5,
                   .5, .5, .5, .5,1
), 5,5)

To simulate the intended data frame, I have done this, but it does not work properly.

 df2 <- as.data.frame (rbinom(100, 1,.5),ncol(5), F1)
1
This code will need to be amended slightly, but it should give you what you need: stats.stackexchange.com/a/285008/87002Phil
Thanks,@ Phil, would you help a bit moreuser10072460
Here's another material that could give help : stats.stackexchange.com/questions/156456/…. It seems difficult though to get the same correlation between every pair of columns, because once you have Cov(A,B)=Cov(A,C)=70%, I can't see how Cov(B,C) can be equal to 70%. That might be a question for Cross-Validated rather than SORomain
@ Romain, I completely know how to run it using Discrete Data, It is very simple but I do not know How to get 0 and 1user10072460

1 Answers

5
votes

I'm surprised this isn't a duplicate (this question refers specifically to non-binary responses, i.e. binomial with N>1). The bindata package does what you want.

library(bindata)
## set up correlation matrix (compound-symmetric with rho=0.5)
m <- matrix(0.5,5,5)
diag(m) <- 1

Simulate with a mean of 0.5 (as in your example):

set.seed(101)
## this simulates 10 rather than 100 realizations
## (I didn't read your question carefully enough)
## but it's easy to change
r <- rmvbin(n=10, margprob=rep(0.5,5), bincorr=m)
round(cor(r),2)

Results

 1.00 0.22  0.80  0.05 0.22
 0.22 1.00  0.00  0.65 1.00
 0.80 0.00  1.00 -0.09 0.00
 0.05 0.65 -0.09  1.00 0.65
 0.22 1.00  0.00  0.65 1.00
  • this looks wrong - the correlations aren't exactly 0.5 - but on average they will be (when I sampled 10,000 vectors rather than 10, the values ranged from about 0.48 to 0.51). Equivalently, if you simulated many samples of 10 and computed the correlation matrix for each, you should find that the expected (average) correlation matrix is correct.
  • simulating values with correlation exactly equal to the specified value is much harder (and not necessarily what you want to do anyway, depending on the application)
  • note that there will be limitations about what mean vectors and correlation matrices are feasible. For example, the off-diagonal elements of an n-by-n compound-symmetric (equal-correlation) matrix can't be less than -1/(n-1). Similarly, there may be limits on what correlations are possible for a given set of means (this may be discussed in the technical reference, I haven't checked).

The reference for this method is

Leisch, Friedrich and Weingessel, Andreas and Hornik, Kurt (1998) On the generation of correlated artificial binary data. Working Papers SFB "Adaptive Information Systems and Modelling in Economics and Management Science", 13. SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, Vienna. https://epub.wu.ac.at/286/