0
votes

as the title states I am trying to generate random numbers from a custom continuous probability density function, which is:

0.001257 *x^4 * e^(-0.285714 *x)

to do so, I use (on python 3) scipy.stats.rv_continuous and then rvs() to generate them

from decimal import Decimal
from scipy import stats
import numpy as np

class my_distribution(stats.rv_continuous):
    def _pdf(self, x):
        return (Decimal(0.001257) *Decimal(x)**(4)*Decimal(np.exp(-0.285714 *x)))

distribution = my_distribution()
distribution.rvs()

note that I used Decimal to get rid of an OverflowError: (34, 'Result too large').

Still, I get an error RuntimeError: Failed to converge after 100 iterations.

What's going on there? What's the proper way to achieve what I need to do?

1
Is the density function in your question the only density function you're trying to sample?Peter O.
@PeterO. yes, it is.sato
Normally, you'd use a cdf to generate random numbers by mapping a uniform distribution (0, 1) to the inverse of your cdf. However, your cdf isn't properly defined since it's always infinity, making the issue hard to solve. You can probably get some mileage if you set a finite starting point for your distribution, and integrate the pdf from that.Mad Physicist

1 Answers

1
votes

I've found out the reason for your issue.

rvs by default uses numerical integration, which is a slow process and can fail in some cases. Your PDF is presumably one of those cases, where the left side grows without bound.

For this reason, you should specify the distribution's support as follows (the following example shows that the support is in the interval [-4, 4]):

distribution = my_distribution(a = -4, b = 4)

With this interval, the PDF will be bounded from above, allowing the integration (and thus the random variate generation) to work as normal. Note that by default, rv_continuous assumes the distribution is supported on the entire real line.

However, this will only work for the particular PDF you give here, not necessarily for arbitrary PDFs.


Usually, when you only give a PDF to your rv_continuous subclass, the subclass's rvs, mean, etc. Will then be very slow, because the method needs to integrate the PDF every time it needs to generate a random variate or calculate a statistic. For example, random variate generation requires using numerical integration to integrate the PDF, and this process can fail to converge depending on the PDF.

In future cases when you're dealing with arbitrary distributions, and particularly when speed is at a premium, you will thus need to add to an _rvs method that uses its own sampler. One example is a much simpler rejection sampler given in the answer to a related question.

See also my section "Sampling from an Arbitrary Distribution".