0
votes

Does anybody know how I could possible simulate data with a correlation between a count variable and a continuous variable? Right now the best idea that I have is to just transform the count variable to make it approximately normal, and then to simulate the data using this R code:

set.seed(2018) 
x = rnorm(n = 1000, mean = 0, sd = 1) 
y = rnorm(n = 1000, mean = .29*x, sqrt(1-.3^2))      
cor(x,y)

However, I really think it would be preferable if I could actually make Y a count variable (because they tend to typically be right-skewed). Also, I want to be able to specify specific correlations between x and y. E.g., simulate data with a 0.5 correlation between x and y etc.

Edit: I'm still looking for help!

1

1 Answers

1
votes

You can use runif to simulate the continuous variable, then feed the result as the lambda (rate) parameter of rpois:

set.seed(1)

continuous <- runif(100, 0, 10)
counts <- rpois(100, continuous)
plot(continuous, counts)

cor(counts, continuous)
#> [1] 0.7852701

Created on 2020-12-11 by the reprex package (v0.3.0)