[SciPy-User] defining new distributions with rv_continuous

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] defining new distributions with rv_continuous

Francesco De Gasperin
Hi,

I have some problems in defining a new distribution using rv_continuous.
As an example I tried to create a gaussian defined only for positive
values of the random variable.

This is the code:

###########################################
from scipy.stats import distributions
class gauss_pos(distributions.rv_continuous):
    def _pdf(self, x):
         if x <= 0: return 0
         sigma = 1.
         c = 0.
         return 1/(np.sqrt(2*np.pi)*sigma) * np.exp( -(x-c)**2 /
(2*sigma**2) )

t_obj = gauss_pos(name="gauss_pos")
print (t_obj.rvs(size=10))
#############################################

but when I run it I always end up with:
OverflowError: (34, 'Numerical result out of range')

Then, I removed the "if x <= 0: return 0" line and I tried to define the
"a" and "b" parameters when creating the t_obj. Defining only the "a=0"
produces the same error, while using something as:

t_obj = gauss_pos(a=0,b=2)

instead produces a result of this type:
[ 2.          2.          1.5460072   1.09621782  1.62184086  0.42107306
   2.          1.09281742  0.82957403  2.        ]

which I still do not understand... it seems that "b" is returned every
time the constraint that the random var must be between a and b is
violated...

any help is appreciated!
cheers,
Francesco
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: defining new distributions with rv_continuous

Evgeni Burovski
In this specific case, have a look at truncnorm:

http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.truncnorm.html

>>> import scipy.stats as stats
>>> import numpy as np
>>> stats.truncnorm.pdf(0.1, 0, np.inf)
0.79390509495402362
>>> stats.truncnorm.pdf(-0.1, 0, np.inf)
0.0



On Mon, Apr 14, 2014 at 11:00 AM, Francesco De Gasperin <[hidden email]> wrote:

> Hi,
>
> I have some problems in defining a new distribution using rv_continuous.
> As an example I tried to create a gaussian defined only for positive
> values of the random variable.
>
> This is the code:
>
> ###########################################
> from scipy.stats import distributions
> class gauss_pos(distributions.rv_continuous):
>     def _pdf(self, x):
>          if x <= 0: return 0
>          sigma = 1.
>          c = 0.
>          return 1/(np.sqrt(2*np.pi)*sigma) * np.exp( -(x-c)**2 /
> (2*sigma**2) )
>
> t_obj = gauss_pos(name="gauss_pos")
> print (t_obj.rvs(size=10))
> #############################################
>
> but when I run it I always end up with:
> OverflowError: (34, 'Numerical result out of range')
>
> Then, I removed the "if x <= 0: return 0" line and I tried to define the
> "a" and "b" parameters when creating the t_obj. Defining only the "a=0"
> produces the same error, while using something as:
>
> t_obj = gauss_pos(a=0,b=2)
>
> instead produces a result of this type:
> [ 2.          2.          1.5460072   1.09621782  1.62184086  0.42107306
>    2.          1.09281742  0.82957403  2.        ]
>
> which I still do not understand... it seems that "b" is returned every
> time the constraint that the random var must be between a and b is
> violated...
>
> any help is appreciated!
> cheers,
> Francesco
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: defining new distributions with rv_continuous

josef.pktd
On Mon, Apr 14, 2014 at 6:13 AM, Evgeni Burovski
<[hidden email]> wrote:

> In this specific case, have a look at truncnorm:
>
> http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.truncnorm.html
>
>>>> import scipy.stats as stats
>>>> import numpy as np
>>>> stats.truncnorm.pdf(0.1, 0, np.inf)
> 0.79390509495402362
>>>> stats.truncnorm.pdf(-0.1, 0, np.inf)
> 0.0
>
>
>
> On Mon, Apr 14, 2014 at 11:00 AM, Francesco De Gasperin <[hidden email]> wrote:
>> Hi,
>>
>> I have some problems in defining a new distribution using rv_continuous.
>> As an example I tried to create a gaussian defined only for positive
>> values of the random variable.
>>
>> This is the code:
>>
>> ###########################################
>> from scipy.stats import distributions
>> class gauss_pos(distributions.rv_continuous):
>>     def _pdf(self, x):
>>          if x <= 0: return 0
>>          sigma = 1.
>>          c = 0.
>>          return 1/(np.sqrt(2*np.pi)*sigma) * np.exp( -(x-c)**2 /
>> (2*sigma**2) )
>>
>> t_obj = gauss_pos(name="gauss_pos")
>> print (t_obj.rvs(size=10))
>> #############################################
>>
>> but when I run it I always end up with:
>> OverflowError: (34, 'Numerical result out of range')
>>
>> Then, I removed the "if x <= 0: return 0" line and I tried to define the
>> "a" and "b" parameters when creating the t_obj. Defining only the "a=0"
>> produces the same error, while using something as:
>>
>> t_obj = gauss_pos(a=0,b=2)
>>
>> instead produces a result of this type:
>> [ 2.          2.          1.5460072   1.09621782  1.62184086  0.42107306
>>    2.          1.09281742  0.82957403  2.        ]
>>
>> which I still do not understand... it seems that "b" is returned every
>> time the constraint that the random var must be between a and b is
>> violated...

The specific problem is that your pdf is not normalized correctly to
integrate to 1.
If you truncate the normal distribution, then you need to normalize
for the truncated probability.

The ppf will not be correct, and it is used in to create the rvs in a
generic way.

As Evgeni pointed out truncnorm, the best way is to look at the source
code for some distributions and use them as pattern.

There is also halfnorm.

Josef


>>
>> any help is appreciated!
>> cheers,
>> Francesco
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: defining new distributions with rv_continuous

Evgeni Burovski
A barebone example of subclassing:

>>> class TNC(stats.rv_continuous):
...    def _pdf(self, x):
...      return stats.norm.pdf(x) * 2.      # cf Josef's email
...
>>> tnc = TNC(name='foo', a=0.)
>>>
>>> tnc.rvs(size=3)
array([ 0.19215378,  2.43396508,  0.58672056])
>>>
>>> tnc.expect(lambda x: 1.)
0.9999999999999998

* Notice there's no need for explicit branching in _pdf, since this is
taken care of in generic pdf once you provide the `a` arg to ctor.
* While this works for small examples, you better be aware that the
generic rvs incurs a *lot* of overhead. You might want to define
explicit _ppf or _rvs.

Evgeni



On Mon, Apr 14, 2014 at 11:18 AM,  <[hidden email]> wrote:

> On Mon, Apr 14, 2014 at 6:13 AM, Evgeni Burovski
> <[hidden email]> wrote:
>> In this specific case, have a look at truncnorm:
>>
>> http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.truncnorm.html
>>
>>>>> import scipy.stats as stats
>>>>> import numpy as np
>>>>> stats.truncnorm.pdf(0.1, 0, np.inf)
>> 0.79390509495402362
>>>>> stats.truncnorm.pdf(-0.1, 0, np.inf)
>> 0.0
>>
>>
>>
>> On Mon, Apr 14, 2014 at 11:00 AM, Francesco De Gasperin <[hidden email]> wrote:
>>> Hi,
>>>
>>> I have some problems in defining a new distribution using rv_continuous.
>>> As an example I tried to create a gaussian defined only for positive
>>> values of the random variable.
>>>
>>> This is the code:
>>>
>>> ###########################################
>>> from scipy.stats import distributions
>>> class gauss_pos(distributions.rv_continuous):
>>>     def _pdf(self, x):
>>>          if x <= 0: return 0
>>>          sigma = 1.
>>>          c = 0.
>>>          return 1/(np.sqrt(2*np.pi)*sigma) * np.exp( -(x-c)**2 /
>>> (2*sigma**2) )
>>>
>>> t_obj = gauss_pos(name="gauss_pos")
>>> print (t_obj.rvs(size=10))
>>> #############################################
>>>
>>> but when I run it I always end up with:
>>> OverflowError: (34, 'Numerical result out of range')
>>>
>>> Then, I removed the "if x <= 0: return 0" line and I tried to define the
>>> "a" and "b" parameters when creating the t_obj. Defining only the "a=0"
>>> produces the same error, while using something as:
>>>
>>> t_obj = gauss_pos(a=0,b=2)
>>>
>>> instead produces a result of this type:
>>> [ 2.          2.          1.5460072   1.09621782  1.62184086  0.42107306
>>>    2.          1.09281742  0.82957403  2.        ]
>>>
>>> which I still do not understand... it seems that "b" is returned every
>>> time the constraint that the random var must be between a and b is
>>> violated...
>
> The specific problem is that your pdf is not normalized correctly to
> integrate to 1.
> If you truncate the normal distribution, then you need to normalize
> for the truncated probability.
>
> The ppf will not be correct, and it is used in to create the rvs in a
> generic way.
>
> As Evgeni pointed out truncnorm, the best way is to look at the source
> code for some distributions and use them as pattern.
>
> There is also halfnorm.
>
> Josef
>
>
>>>
>>> any help is appreciated!
>>> cheers,
>>> Francesco
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user