I have a List<Double>
that holds probabilities (weights) for sampling an item. For example, the List
holds 5 values as follows.
0.1, 0.4, 0.2, 0.1, 0.2
Each i-th Double
value is the probability of sampling the i-th item of another List<Object>
.
How can I construct an algorithm to perform the sampling according to these probabilities?
I tried something like this, where I first made the list of probabilities into a cumulative form.
0.1, 0.5, 0.7, 0.8, 1.0
Then my approach is as follows. I generate a random double, and iterate over the list to find the first item that is larger than the random double, and then return its index.
Random r = new Random();
double p = r.nextDouble();
int total = list.size();
for(int i=0; i < total; i++) {
double d = list.get(i);
if(d > p) {
return i;
}
}
return total-1;
This approach is slow as I am crawling through the list sequentially. In reality, my list is of 800,000 items associated with weights (probabilities) that I need to sample from. So, needless to say, this sequential approach is slow.
I'm not sure how binary search can help. Let's say I generated p = 0.01. Then, a binary search can use recursion as follows with the list.
compare 0.01 to 0.7, repeat with L = 0.1, 0.5 compare 0.01 to 0.1, stop compare 0.01 to 0.5, stop
0.01 is smaller than 0.7, 0.5, and 0.1, but I obviously only want 0.1. So the stopping criteria is still not clear to me when using binary search.
If there's a library to help with this type of thing I'd also be interested.