[SciPy-User] Loss of precision with powers and factorials?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Loss of precision with powers and factorials?

Chris Cameron
Hi,

I tried to do some work with the binomial probability formula, but I’m not getting the answers I expect. My code:

############
import numpy as np
import scipy as sp

n = 36000
i = np.array(range(0, 250))

np.sum(
    (sp.misc.factorial(n) / \
    (sp.misc.factorial(n-i) * sp.misc.factorial(i))) * \
    np.power(0.00625, i) * np.power((1-0.00625), (n-i)))
############

When I execute this I get ‘nan’.

In Mathematica this same formula and values get me a seemingly reasonable answer:
Sum[36000!/((36000 - i)!*i!)*0.00625^i*(1 - 0.00625)^(36000 - i), {i, 0, 250}] = 0.95406

I’m running Python 2.7.11, Numpy 1.10.4, and SciPy 0.17 from inside iPython 4.1.2


From trying to execute this piece by piece, it seems like the “np.power()” function is just returning “0” when the result is super small. Could the problem be not enough precision is being held?


Thanks!

Chris
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Loss of precision with powers and factorials?

josef.pktd
On Thu, May 5, 2016 at 4:43 PM, Chris Cameron <[hidden email]> wrote:

> Hi,
>
> I tried to do some work with the binomial probability formula, but I’m not getting the answers I expect. My code:
>
> ############
> import numpy as np
> import scipy as sp
>
> n = 36000
> i = np.array(range(0, 250))
>
> np.sum(
>     (sp.misc.factorial(n) / \
>     (sp.misc.factorial(n-i) * sp.misc.factorial(i))) * \
>     np.power(0.00625, i) * np.power((1-0.00625), (n-i)))
> ############
>
> When I execute this I get ‘nan’.
>
> In Mathematica this same formula and values get me a seemingly reasonable answer:
> Sum[36000!/((36000 - i)!*i!)*0.00625^i*(1 - 0.00625)^(36000 - i), {i, 0, 250}] = 0.95406
>
> I’m running Python 2.7.11, Numpy 1.10.4, and SciPy 0.17 from inside iPython 4.1.2
>
>
> From trying to execute this piece by piece, it seems like the “np.power()” function is just returning “0” when the result is super small. Could the problem be not enough precision is being held?


Floating precision does not work well enough for this.

The source of scipy.stats.distribution is now for most parts a good
place to see how to preserve numerical precision.

Essentially, don't do that, work in log space and use scipy special
function like lngamma instead of factorial and similar.

Josef


>
>
> Thanks!
>
> Chris
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> https://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Loss of precision with powers and factorials?

Alok Singhal
In reply to this post by Chris Cameron
Chris Cameron <chris <at> upnix.com> writes:                                    

>                                                                              
> ############                                                                  
> import numpy as np                                                            
> import scipy as sp                                                            
>                                                                              
> n = 36000                                                                    
> i = np.array(range(0, 250))                                                  
>                                                                              
> np.sum(                                                                      
>     (sp.misc.factorial(n) / \                                                
>     (sp.misc.factorial(n-i) * sp.misc.factorial(i))) * \                      
>     np.power(0.00625, i) * np.power((1-0.00625), (n-i)))                      
> ############                                                                  
>                                                                              
> When I execute this I get ‘nan’.                                              
>                                                                              
> In Mathematica this same formula and values get me a seemingly                
> reasonable answer:                                                            
> Sum[36000!/((36000 - i)!*i!)*0.00625^i*(1 - 0.00625)^(36000 - i),            
>  {i, 0, 250}] = 0.95406                                                      
>                                                                              
> I’m running Python 2.7.11, Numpy 1.10.4, and SciPy 0.17 from inside          
> iPython 4.1.2                                                                
>                                                                              
> From trying to execute this piece by piece, it seems like the “np.power()”    
> function  is just returning “0” when the result is super small. Could the    
> problem be not enough precision is being held?                                
                                                                               
You are trying to calculate factorial(36000), which is ~1e148395, so your      
calculation overflows in the first step.  Similarly for factorial(36000 - i),  
and then their ratio is nan.                                                    
                                                                               
Instead of calculating the factorials and then dividing, you should "divide    
as you go", so that the numbers don't get too large.  gmpy automatically does  
that for you:                                                                  
                                                                               
>>> import gmpy                                                                
>>> a = gmpy.mpf('0.00625')                                                    
>>> ivals = [gmpy.mpz(x) for x in range(251)]                                  
>>> n = 36000                                                                  
>>> result = sum(gmpy.comb(n, x) * a**x * (1-a)**(n-x) for x in ivals)          
>>> float(result)                                                              
0.954060112106136                                                              

This is pretty fast and gives an answer that matches the answer you get from    
Mathematica.                                                                    
                                                                               
In fact, you don't even need to calculate the sum yourself.  You can use the    
formula for CDF of binomial distribution from                                  
http://mathworld.wolfram.com/BinomialDistribution.html to get the answer you    
want:                                                                          
                                                                               
result = 1.0 - I_p(n+1, N-n) (where p = 0.00625, n = 250, N = 36000)            
                                                                               
result = B(p; n+1, N-n) / B(n+1, N-n)                                          
                                                                               
So, using scipy:                                                                
                                                                               
>>> import scipy.special                                                        
>>> betainc = scipy.special.betainc                                            
>>> n = 250                                                                    
>>> N = 36000                                                                  
>>> p = 0.00625                                                                
>>> result = 1.0 - betainc(n+1, N-n, p) / betainc(n+1, N-n, 1.0)                
>>> print(result)                                                              
0.95406011210468011                                                            
                                                                               
(Note that in your Python implementation, you are summing from 0 to 249,        
whereas in your Mathematica implementation, you are summing from 0 to 250.      
I am including 250 in my examples above).                                      
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Loss of precision with powers and factorials?

Evgeni Burovski
On Fri, May 6, 2016 at 7:39 AM, Alok Singhal <[hidden email]> wrote:

> Chris Cameron <chris <at> upnix.com> writes:
>>
>> ############
>> import numpy as np
>> import scipy as sp
>>
>> n = 36000
>> i = np.array(range(0, 250))
>>
>> np.sum(
>>     (sp.misc.factorial(n) / \
>>     (sp.misc.factorial(n-i) * sp.misc.factorial(i))) * \
>>     np.power(0.00625, i) * np.power((1-0.00625), (n-i)))
>> ############
>>
>> When I execute this I get ‘nan’.
>>
>> In Mathematica this same formula and values get me a seemingly
>> reasonable answer:
>> Sum[36000!/((36000 - i)!*i!)*0.00625^i*(1 - 0.00625)^(36000 - i),
>>  {i, 0, 250}] = 0.95406
>>
>> I’m running Python 2.7.11, Numpy 1.10.4, and SciPy 0.17 from inside
>> iPython 4.1.2
>>
>> From trying to execute this piece by piece, it seems like the “np.power()”
>> function  is just returning “0” when the result is super small. Could the
>> problem be not enough precision is being held?
>
> You are trying to calculate factorial(36000), which is ~1e148395, so your
> calculation overflows in the first step.  Similarly for factorial(36000 - i),
> and then their ratio is nan.
>
> Instead of calculating the factorials and then dividing, you should "divide
> as you go", so that the numbers don't get too large.  gmpy automatically does
> that for you:
>
>>>> import gmpy
>>>> a = gmpy.mpf('0.00625')
>>>> ivals = [gmpy.mpz(x) for x in range(251)]
>>>> n = 36000
>>>> result = sum(gmpy.comb(n, x) * a**x * (1-a)**(n-x) for x in ivals)
>>>> float(result)
> 0.954060112106136
>
> This is pretty fast and gives an answer that matches the answer you get from
> Mathematica.
>
> In fact, you don't even need to calculate the sum yourself.  You can use the
> formula for CDF of binomial distribution from
> http://mathworld.wolfram.com/BinomialDistribution.html to get the answer you
> want:
>
> result = 1.0 - I_p(n+1, N-n) (where p = 0.00625, n = 250, N = 36000)
>
> result = B(p; n+1, N-n) / B(n+1, N-n)
>
> So, using scipy:
>
>>>> import scipy.special
>>>> betainc = scipy.special.betainc
>>>> n = 250
>>>> N = 36000
>>>> p = 0.00625
>>>> result = 1.0 - betainc(n+1, N-n, p) / betainc(n+1, N-n, 1.0)
>>>> print(result)
> 0.95406011210468011
>
> (Note that in your Python implementation, you are summing from 0 to 249,
> whereas in your Mathematica implementation, you are summing from 0 to 250.
> I am including 250 in my examples above).


I'd add that you can in principle use python loooong integers for factorials:

>>> from scipy.special import factorial as f
>>> f(36000)
array(inf)
>>> x = f(36000, exact=True)
>>> y = f(35999, exact=True)
>>> x // y
36000
>>> type(x)
<class 'int'>

>>> import scipy
>>> scipy.__version__
'0.18.0.dev0+8b07439'

Not saying it's a good idea to use them for calculations of binomials
(it isn't), but at least technically the option is there.
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Loss of precision with powers and factorials?

Eric Moore
Use the stats module instead of doing this calculation yourself.  A huge amount of work has already been put in to calculate things like this in numerically robust ways.

In [1]: from scipy import stats


In [2]: stats.binom.cdf(250,36000, 0.00625)

Out[2]: 0.95406011210591368



On Fri, May 6, 2016 at 2:55 AM, Evgeni Burovski <[hidden email]> wrote:
On Fri, May 6, 2016 at 7:39 AM, Alok Singhal <[hidden email]> wrote:
> Chris Cameron <chris <at> upnix.com> writes:
>>
>> ############
>> import numpy as np
>> import scipy as sp
>>
>> n = 36000
>> i = np.array(range(0, 250))
>>
>> np.sum(
>>     (sp.misc.factorial(n) / \
>>     (sp.misc.factorial(n-i) * sp.misc.factorial(i))) * \
>>     np.power(0.00625, i) * np.power((1-0.00625), (n-i)))
>> ############
>>
>> When I execute this I get ‘nan’.
>>
>> In Mathematica this same formula and values get me a seemingly
>> reasonable answer:
>> Sum[36000!/((36000 - i)!*i!)*0.00625^i*(1 - 0.00625)^(36000 - i),
>>  {i, 0, 250}] = 0.95406
>>
>> I’m running Python 2.7.11, Numpy 1.10.4, and SciPy 0.17 from inside
>> iPython 4.1.2
>>
>> From trying to execute this piece by piece, it seems like the “np.power()”
>> function  is just returning “0” when the result is super small. Could the
>> problem be not enough precision is being held?
>
> You are trying to calculate factorial(36000), which is ~1e148395, so your
> calculation overflows in the first step.  Similarly for factorial(36000 - i),
> and then their ratio is nan.
>
> Instead of calculating the factorials and then dividing, you should "divide
> as you go", so that the numbers don't get too large.  gmpy automatically does
> that for you:
>
>>>> import gmpy
>>>> a = gmpy.mpf('0.00625')
>>>> ivals = [gmpy.mpz(x) for x in range(251)]
>>>> n = 36000
>>>> result = sum(gmpy.comb(n, x) * a**x * (1-a)**(n-x) for x in ivals)
>>>> float(result)
> 0.954060112106136
>
> This is pretty fast and gives an answer that matches the answer you get from
> Mathematica.
>
> In fact, you don't even need to calculate the sum yourself.  You can use the
> formula for CDF of binomial distribution from
> http://mathworld.wolfram.com/BinomialDistribution.html to get the answer you
> want:
>
> result = 1.0 - I_p(n+1, N-n) (where p = 0.00625, n = 250, N = 36000)
>
> result = B(p; n+1, N-n) / B(n+1, N-n)
>
> So, using scipy:
>
>>>> import scipy.special
>>>> betainc = scipy.special.betainc
>>>> n = 250
>>>> N = 36000
>>>> p = 0.00625
>>>> result = 1.0 - betainc(n+1, N-n, p) / betainc(n+1, N-n, 1.0)
>>>> print(result)
> 0.95406011210468011
>
> (Note that in your Python implementation, you are summing from 0 to 249,
> whereas in your Mathematica implementation, you are summing from 0 to 250.
> I am including 250 in my examples above).


I'd add that you can in principle use python loooong integers for factorials:

>>> from scipy.special import factorial as f
>>> f(36000)
array(inf)
>>> x = f(36000, exact=True)
>>> y = f(35999, exact=True)
>>> x // y
36000
>>> type(x)
<class 'int'>

>>> import scipy
>>> scipy.__version__
'0.18.0.dev0+8b07439'

Not saying it's a good idea to use them for calculations of binomials
(it isn't), but at least technically the option is there.
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user