How to generate n random numbers from negative binomial distribution?

Question

I am trying to make a function in order to generate n random numbers from negative binomial distribution. To generate it, I first made a function to generate n random variables from geometric distribution. My function for generating n random numbers from geometric distribution as follows:

rGE<-function(n,p){
  I<-rep(NA,n)
  for (j in 1:n){
  x<-rBer(1,p)
  i<-1 # number of trials
  while(x==0){
    x<-rBer(1,p)
    i<-i+1
  }
  I[j]<- i
  }
  return(I)
}

I tested this function (rGE), for example for rGE(10,0.5), which is generating 10 random numbers from a geometric distribution with probability of success 0.5, a random result was:

[1] 2 4 2 1 1 3 4 2 3 3

In rGE function I used a function named rBer which is:

rBer<-function(n,p){
  sample(0:1,n,replace = TRUE,prob=c(1-p,p))
}

Now, I want to improve my above function (rGE) in order to make a function for generating n random numbers from a negative binomial function. I made the following function:

rNB<-function(n,r,p){
  I<-seq(n)
  for (j in 1:n){
    x<-0
    x<-rBer(1,p)
    i<-1 # number of trials
    while(x==0 & I[j]!=r){
      x<-rBer(1,p)
      i<-i+1
    }
    I[j]<- i
  }
  return(I)
}

I tested it for rNB(3,2,0.1), which generates 3 random numbers from a negative binomial distribution with parametrs r=2 and p=0.1 for several times:

> rNB(3,2,0.1)
[1] 2 1 7
> rNB(3,2,0.1)
[1] 3 1 4
> rNB(3,2,0.1)
[1] 3 1 2
> rNB(3,2,0.1)
[1] 3 1 3
> rNB(3,2,0.1)
[1] 46  1 13

As you can see, I think my function (rNB) does not work correctly, because the results always generat 1 for the second random number. Could anyone help me to correct my function (rNB) in order to generate n random numbers from a negative binomial distribution with parametrs n, r, and p. Where r is the number of successes and p is the probability of success?

[[Hint: Explanations regarding geometric distribution and negative binomial distribution: Geometric distribution: In probability theory and statistics, the geometric distribution is either of two discrete probability distributions:

The probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set { 1, 2, 3, ... }.
The probability distribution of the number Y = X − 1 of failures before the first success, supported on the set { 0, 1, 2, 3, ... }

Negative binomial distribution:A negative binomial experiment is a statistical experiment that has the following properties: The experiment consists of x repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial. The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials. The experiment continues until r successes are observed, where r is specified in advance. ]]

Why not use rnbinom? And where is rBer defined? Is it just rBer <- function(n, p) rbinom(n, 1, p)? — Allan Cameron
Thank you for your comment @AllanCameron. I do not want to use r functions like rnbinom. I want to make my own functions. I added rBer function to my above explanations. — Rojer
I think I understand vahid. I see you are using the function sample though. Is this the only random - generating function you want to use? — Allan Cameron

Allan Cameron Allan Cameron · Accepted Answer · 2020-09-30T21:11:10

Your function will be much faster if you use R's native vectorization. The way you can do this is to generate all your Bernoulli trials at once.

Note that for a negative binomial distribution, the expected value (i.e. the mean number of Bernoulli trials it will take to get r successes) is r * p / (1 - p) (Reference)

If we want to draw n negative binomial samples, then the expected total number of Bernoulli trials will therefore be n * r * p / (1 - p). So we want to draw at least that many Bernoulli samples. For simplicity, we can start by drawing twice that number: 2 * n * r * p / (1 - p) . In the unlikely case that this is not enough, we can draw twice as many again repeatedly until we have enough; once the sum of the resultant vector of Bernoulli trials is greater than r * n, we know we have enough Bernoulli trials to simulate our n negative binomial trials.

We can now run a cumsum on the vector of Bernoulli trials to keep track of the number of positive trials. If you then perform integer division on this vector by %/% r, you will have all the Bernoulli trials labelled according to which negative binomial trial they belonged to. You then table this vector.

The first r numbers of the table (obtained by subsetting the table by [1:n] or equivalently by [seq(n)] is your negative binomial draw. We just remove the table's names by using as.numeric. We also subtract the number of successes (i.e. r), from each of our counts, since we are only counting the failures, not the successes.

rNB <- function(n, r, p) {
  mult <- 2
  all_samples <- 0
  while(sum(all_samples) < n * r)
  {
    all_samples <- rBer(mult * n * r * p / (1 - p), p)
    mult <- mult * 2
  }
  as.numeric(table(cumsum(all_samples) %/% r))[seq(n)] - r
}

So we can do:

rNB(3, 2, 0.1)
#> [1] 14 19 41

rNB(3, 2, 0.1)
#> [1] 23  6 56

rNB(3, 2, 0.1)
#> [1] 11 31 59

rNB(3, 2, 0.1)
#> [1]  7 21 14

mean(rNB(10000, 2, 0.1))
#> [1] 18.0002

We can test this against R's own rnbinom:

mean(rnbinom(10000, 2, 0.1))
#> [1] 18.0919

hist(rnbinom(10000, 2, 0.5), breaks = 0:20)

hist(rNB(10000, 2, 0.5), breaks = 0:20)

Note that the logic of your own version isn't quite right. In particular, the line while(x == 0 & I[j] != r) doesn't make any sense. I is a vector of 1:n, so in your example, whenever j is 2, I[j] is equal to r and the loop stops. This is why your second number is always 1. I don't know what you were trying to do here.

If you want to do it one Bernoulli trial at a time, as you are doing in your own version, try this modified function. The variable names should hopefully make it easy to follow the logic:

rNB <- function(n, r, p) {
  # Create an empty vector of length n for our results
  draws <- numeric(n)
  
  # Now for each of the n trials we will get a negative binomial sample:
  for (i in 1:n) {
    # Create success and failure counters for this draw
    failures  <- successes <- 0
    
    # Now run Bernoulli trials, counting successes and failures as we go
    # until we hit r successes
    while(successes < r)
    {
      if(rBer(1, p) == 1) 
        successes <- successes + 1
      else
        failures  <- failures + 1
    }

    # Once we have reached r successes, the current number of failures is our
    # negative binomial draw
    draws[i] <- failures
  }
  
  return(draws)
}

This gives identical results to the faster, albeit more opaque, vectorized version.

How to generate n random numbers from negative binomial distribution?

1 Answers