This is how I would do it (blind iterative algorithm, assuming no knowledge, for when you are purely interested in "how to simulate this"):
simulate.sd <- function(nsim=10, n=200, seed=101, tol=0.01) {
set.seed(seed)
sd.value <- 1
rsquare <- 1:nsim
results <- 1:nsim
for (i in 1:nsim) {
# tracking iteration: if we miss the value, abort at sd.value > 7.
iter <- 0
while (rsquare[i] > (0.20 + tol) | rsquare[i] < (0.2 - tol)) {
sd.value <- sd.value + 0.01
rsquare[i] <- simulate.sd.iter(sd.value, n)
iter <- iter + 1
if (iter > 3000) { break }
}
results[i] <- sd.value # store the current sd.value that is OK!
sd.value <- 1
}
cbind(results, rsquare)
}
simulate.sd.iter <- function(sd.value, n=200) { # helper function
# Takes the sd.value, creates data, and returns the r-squared
X1 <- rnorm(n, 0, 1)
X2 <- rnorm(n, 0, 1)
Y <- rnorm(n, (5 + 3*X1 - 2*X2), sd.value)
simdata <- data.frame(X1, X2, Y)
return(summary(lm(Y ~ X1 + X2, data=simdata))$r.squared)
}
simulate.sd()
A few things to note:
- I let the X1 and X2 vary, since this affects this sought
sd.value
.
- The tolerance is how exact you want this estimate to be. Are you fine with an r-squared of ~0.19 or ~0.21? Have the tolerance be 0.01.
- Note that a too precise tolerance might not allow you to find a result.
- The value of 1 is quite a bad starting value, making this iterative algorithm quite slow.
The resulting vector for 10 results is:
[1] 5.64 5.35 5.46 5.42 5.79 5.39 5.64 5.62 4.70 5.55
,
which takes roughly 13 seconds on my machine.
My next step would be to start from 4.5, add 0.001 to the iteration instead of 0.01, and perhaps lower the tolerance. Good luck!
Alright, some summary statistics for nsim=100, taking 150 seconds, with steps increase of 0.001, and tolerance still at 0.01:
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.513 4.913 5.036 5.018 5.157 5.393
Why are you interested in this though?