5
votes

I need to generate random values for two beta-distributed variables that are correlated using SAS. The two variables of interest are characterized as follows:


X1 has mean = 0.896 and variance = 0.001.

X2 has mean = 0.206 and variance = 0.004.

For X1 and X2, p = 0.5, where p is the correlation coefficient.


Using SAS, I understand how to generate a random number specifying a beta distribution using the function X = RAND('BETA', a, b), where a and b are the two shape parameters for a variable X that can be calculated from the mean and variance. However, I want to generate values for both X1 and X2 simultaneously while specifying that they are correlated at p = 0.5.

1
I don't have an answer as this is outside my understanding, but I would recommend looking up Rick Wicklin's book, Simulating Data With SAS. It may well cover how to do this. He also has some articles on the Do Loop which contain some of the same information, so it may be worth looking there as well.Joe
@Joe thanks for pointing me to Rick Wicklin's book^. I came to a solution based on modified methods from Chapter 9 - Advanced Simulation of Multivariate Data [9.5: Generating Data From Copulas]. I will post the answer tomorrow.Gavin M. Jones
There are, of course, infinitely many joint distributions that have the properties that you specify. Copulas are one way to choose a particular simple dependence structure, and you have already found the book and chapter that I recommend. Be aware that not all correlations are possible. I don't know whether rho=0.5 is permitted or not. For an example, see blogs.sas.com/content/iml/2012/09/12/…Rick
You may also want to check Steen Magnussen's 2002 paper: sciencedirect.com/science/article/pii/S0167947303001695MichaelChirico

1 Answers

4
votes

This solution is based on modified methods used from Chapter 9 of Simulating Data with SAS by Rick Wicklin.

In this particular example, I first have to define variable means, variances, and shape-parameters (alpha, beta) that are associated with the beta distribution:

data beta_corr_vars;
    input x1 var1 x2 var2;  *mean1, variance1, mean2, variance2;
    *calculate shape parameters alpha and beta from means and variances;
    alpha1 = ((1 - x1) / var1 - 1/ x1) * x1**2;   
    alpha2 = ((1 - x2) / var2 - 1/ x2) * x2**2; 
    beta1 = alpha1 * (1 / x1 - 1);
    beta2 = alpha2 * (1 / x2 - 1);
    *here are the means and variances referred to in the original question;
    datalines; 
0.896 0.001 0.206 0.004
;
run;
proc print data = beta_corr_vars;
run;

Once these variables are defined:

proc iml;
  use beta_corr_vars; read all; 
  call randseed(12345);
      N = 10000;                  *number of random variable sets to generate;
      *simulate bivariate normal data with a specified correlation (here, rho = 0.5);
      Z = RandNormal(N, {0, 0}, {1 0.5, 0.5 1});   *RandNormal(N, Mean, Cov);
      *transform the normal variates into uniform variates;
      U = cdf("Normal", Z);      

      *From here, we can obtain beta variates for each column of U by; 
      *applying the inverse beta CDF;
      x1_beta = quantile("Beta", U[,1], alpha1, beta1);        
      x2_beta = quantile("Beta", U[,2], alpha2, beta2); 
      X = x1_beta || x2_beta; 

  *check adequacy of rho values--they approach the desired values with more sims (N);
  rhoZ = corr(Z)[1,2];                
  rhoX = corr(X)[1,2];

print X;
print rhoZ rhoX;

Thank you to all users who contributed to this answer.