0
votes

If I generate n random numbers in the interval [0,1] then the mean will be around 0.5 and they will be uniformly distributed. How could an algorithm/formula look like if I want to get n random numbers still in the interval [0,1], however, e.g. with a mean of 0.6. They should still be distributed as uniformly as possible, however numbers bigger than > 0.5 a bit more frequently.

So far I have only found solutions, which would assume a different distribution, e.g. with a normal distribution it would be quite easy to have numbers around the desired mean, but then numbers which are much larger or much smaller will be much less frequent and I'd like to avoid that.

The programming language does not really matter. I am currently trying to do that with R however.

2
The description of the distribution is a bit fuzzy. You can't have a mean of 0.6 and uniform in the range [0, 1]. Do you want the PDF to be flat from 0 to 0.5, then step up and be flat again from 0.5 to 1? Do you want it to gradually increase from 0 to 1? Is it OK if it curves a bit? Maybe try sketching the PDF you want and including an image of it in your question. - Richie Cotton
One way to formalize this would be to think about criteria in terms of the CDF. (1) Must be a non-decreasing function on (0,1). (2) CDF(0)=0, CDF(1)=1 (range criterion). (3) integral (x*CDF dx) = m (mean criterion). (4) minimize the integrated second derivative (this one is trickier, but it's one way to operationalize "as uniform as possible". However, it rules out a piecewise-linear function. - Ben Bolker
One could also characterize "uniformity" as the variance of the PDF (I don't know what if any relationship that would have to the integrated second derivative of the CDF). This might make a good question for CrossValidated ... - Ben Bolker

2 Answers

3
votes

This is more of a statistics question: you don't want a uniform distribution, but rather a different distribution that is similar but different from the uniform. Just with your explanations, there are different distributions that could correspond to what you ask, for example you could make a density function with a smooth slope between 0 and 1. Or you could have a "bump" around 0.6.

You should check out the beta distribution, which has properties similar to what you want. It has two shape parameters, that can make the distribution more bumpy if you want. And you can repametrize it* to input the desired mean.


x <- 0:200/100 - .5
plot(x, dunif(x), type="l", main = "Uniform")

plot(x, dbeta(x,1.1,1), type = "l", main = "Beta 1.1; 1")

plot(x, dbeta(x,1.3,1.1), type = "l", main = "Beta 1.3; 1.1")

Created on 2020-12-09 by the reprex package (v0.3.0)

  • Reparametrization: as per the linked Wikipedia article, we have these relationships:

    α = μν, β = (1 − μ)ν

Where μ is the mean, and ν a sample size parameter. So, if you want a given μ=0.06 you just need to choose a value of ν and that gives you the shape1 and shape2 parameters to feed in rbeta().

1
votes

You could do this by taking your sample first, then finding the number which, when the sample is raised to this power, gives it the desired mean. You can find this number using optimize and wrap it all in a handy function:

runif_skew <- function(n, mean) {
  y <- runif(n)
  o <- optimize(function(x) sapply(x, function(a) (mean(y^a) - mean)^2), 
                c(-10, 10))
  return(y^o$minimum)
}

So testing, we get:

set.seed(1234)

samp <- runif_skew(100, mean = 0.6)
samp
#>   [1] 0.30960945 0.77430422 0.76552230 0.77502862 0.92241558 0.78630999
#>   [7] 0.08116296 0.45539329 0.80322229 0.69863075 0.82094332 0.72083812
#>  [13] 0.50599764 0.95795377 0.51517442 0.90868101 0.50935602 0.49043546
#>  [19] 0.40456114 0.45505040 0.53784057 0.52495782 0.37103194 0.17624617
#>  [25] 0.44066837 0.89294028 0.70697395 0.95303388 0.90519278 0.18954096
#>  [31] 0.65484658 0.48881344 0.52680572 0.69352738 0.39794079 0.86223518
#>  [37] 0.42123912 0.48243924 0.99575933 0.89101011 0.72677936 0.79033774
#>  [43] 0.53343892 0.77398195 0.54978076 0.68960374 0.81035543 0.67690563
#>  [49] 0.46727665 0.86577235 0.24520062 0.53146373 0.83594108 0.69148941
#>  [55] 0.36335672 0.69103666 0.68362817 0.85703730 0.39023824 0.91515557
#>  [61] 0.92467723 0.18062297 0.53836223 0.09909591 0.46218794 0.82914424
#>  [67] 0.52998881 0.69444152 0.20229817 0.73470110 0.32085460 0.94070430
#>  [73] 0.10245695 0.87648793 0.27287373 0.70224102 0.59704639 0.23844073
#>  [79] 0.54152334 0.80478925 0.95961235 0.66699780 0.34984342 0.72033505
#>  [85] 0.41547888 0.94396313 0.60141737 0.53255866 0.37226639 0.94260574
#>  [91] 0.38017938 0.94500732 0.33838968 0.33502176 0.29703296 0.69667413
#>  [97] 0.52262060 0.14178325 0.53142748 0.85143495

hist(samp)

Note that the domain of the sample stays within (0, 1). And our mean is exactly right:

mean(samp)
#> [1] 0.6

Created on 2020-12-09 by the reprex package (v0.3.0)