computing Bayesian credible intervals

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

computing Bayesian credible intervals

Johann Cohen-Tanugi-2
Hello,
The question is a bit more general than that : How can I use SciPy to
compute the values a and b defined by :
1) integral from a to b of a pdf p(x) (or any function for that matter)
= a given value q
2) p(a)=p(b)

thanks in advance,
Johann
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

nmb (Bugzilla)
Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:

> The question is a bit more general than that : How can I use SciPy to
> compute the values a and b defined by :
> 1) integral from a to b of a pdf p(x) (or any function for that matter)
> = a given value q
> 2) p(a)=p(b)

Problem 1 is under-specified.  You have one equation for two unknowns and
generally will not be able to solve uniquely for both of the endpoints.  It may
be common in your field to require some sort of symmetry between a and b, e.g. a
= mu + E, b = mu - E where mu is some fixed quantity and now you solve for E
rather than a and b separately.  I (perhaps naively) would do this in scipy
using scipy.optimize.brentq such as

form numpy import exp
from scipy.optimize import brentq

mu = 0

def density(x):
    return exp(-x)

def integral_function(E, mu):
    return scipy.integrate(density, mu-E, mu+E) - q

margin = brentq(integral_function, 0, 1000)
a,b = mu - margin, mu+margin

For problem 2, I'm not sure what the function p represents, but a suitable
adaptation of the above should work.

-Neil



_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Johann Cohen-Tanugi-2
hi Neil, thanks for your answer and sorry I was not clear enough. Of
course I require the 2 conditions. 1) defines *a* credible interval if p
is a posterior pdf; and 2) sets a constraint that for common situation
yield *the* standard Bayesian credible interval. I will have a look at
brentq, I do not know what it refers to.
best,
Johann


Neil Martinsen-Burrell wrote:

> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
>
>  
>> The question is a bit more general than that : How can I use SciPy to
>> compute the values a and b defined by :
>> 1) integral from a to b of a pdf p(x) (or any function for that matter)
>> = a given value q
>> 2) p(a)=p(b)
>>    
>
> Problem 1 is under-specified.  You have one equation for two unknowns and
> generally will not be able to solve uniquely for both of the endpoints.  It may
> be common in your field to require some sort of symmetry between a and b, e.g. a
> = mu + E, b = mu - E where mu is some fixed quantity and now you solve for E
> rather than a and b separately.  I (perhaps naively) would do this in scipy
> using scipy.optimize.brentq such as
>
> form numpy import exp
> from scipy.optimize import brentq
>
> mu = 0
>
> def density(x):
>     return exp(-x)
>
> def integral_function(E, mu):
>     return scipy.integrate(density, mu-E, mu+E) - q
>
> margin = brentq(integral_function, 0, 1000)
> a,b = mu - margin, mu+margin
>
> For problem 2, I'm not sure what the function p represents, but a suitable
> adaptation of the above should work.
>
> -Neil
>
>
>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>  
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

nmb (Bugzilla)
Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:

> hi Neil, thanks for your answer and sorry I was not clear enough. Of
> course I require the 2 conditions. 1) defines *a* credible interval if p
> is a posterior pdf; and 2) sets a constraint that for common situation
> yield *the* standard Bayesian credible interval. I will have a look at
> brentq, I do not know what it refers to.

scipy.optimize.brentq is Brent's method for finding a root of a given scalar
equation.  Since you are looking for two values, a and b, with two conditions,
then Brent's method is not appropriate (barring some symmetry-based reduction to
one variable).  I like to use scipy.optimize.fsolve to find roots of
multivariable equations, such as

def solve_me(x): # x is an array of the values you are solving for
    a,b = x
    integral_error = quad(density, a , b) - q
    prob_difference = density(b) - density(a)
    return np.array([integral_error, prob_difference])

fsolve(solve_me, [0.0, 1.0])  # initial guess is a = 0, b = 1




_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Johann Cohen-Tanugi-2
hi Neil,
thanks a lot. I needed to modify :

integral_error = quad(density, a , b) - q

into

integral_error = quad(density, a , b)[0] - q

as quad returns a tuple with the result and the estimated error

More interesting, as a test  I used :
q=0.96

def density(s):
    if s<0.5:
        return 4*s
    else :
        return 4-4*s

For which the solution is a=0.1, and b=0.9
If I keep
results=optimize.fsolve(solve_me, [0, 1])
fsolve fails because it wanders in directions a<0 and b>1.
With
results=optimize.fsolve(solve_me, [0.5, 0.5])
fsolve converges without trouble.... Is there a simple way to get it to
work with constraints like a>0 and b<1? I am afraid that the answer is
no, if I understood well the context of another recent thread entitled
"constrained optimization".

Thanks a lot,
Johann

Neil Martinsen-Burrell wrote:

> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
>
>  
>> hi Neil, thanks for your answer and sorry I was not clear enough. Of
>> course I require the 2 conditions. 1) defines *a* credible interval if p
>> is a posterior pdf; and 2) sets a constraint that for common situation
>> yield *the* standard Bayesian credible interval. I will have a look at
>> brentq, I do not know what it refers to.
>>    
>
> scipy.optimize.brentq is Brent's method for finding a root of a given scalar
> equation.  Since you are looking for two values, a and b, with two conditions,
> then Brent's method is not appropriate (barring some symmetry-based reduction to
> one variable).  I like to use scipy.optimize.fsolve to find roots of
> multivariable equations, such as
>
> def solve_me(x): # x is an array of the values you are solving for
>     a,b = x
>     integral_error = quad(density, a , b) - q
>     prob_difference = density(b) - density(a)
>     return np.array([integral_error, prob_difference])
>
> fsolve(solve_me, [0.0, 1.0])  # initial guess is a = 0, b = 1
>
>
>
>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>  

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals : help on constrained optimisation schemes?

Johann Cohen-Tanugi-2
In reply to this post by nmb (Bugzilla)
Hello,
I am attaching my script. I followed Neil's suggestion but it fails to
converge, due seemingly to issues with the fact that a and b must be
positive, which I cannot enforce with fsolve, AFAIK.
I am ready to dive into constrained schemes, especially in openOpt, but
I was hoping for some suggestions beforehand as to which path to follow
to solve this problem now.
I remind that I am trying to find a and b so that :
integral from a to b of p(x) = Q
and p(a)=p(b)
where Q is given (0.95 in my script) and p is a Poisson posterior pdf
for ON/OFF source experiments. a,b, and x are source rates, and as such
are positive.
People will have recognized the computation of a Bayesian credible
interval here!!

thanks a lot in advance,
Johann

Neil Martinsen-Burrell wrote:

> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
>
>  
>> hi Neil, thanks for your answer and sorry I was not clear enough. Of
>> course I require the 2 conditions. 1) defines *a* credible interval if p
>> is a posterior pdf; and 2) sets a constraint that for common situation
>> yield *the* standard Bayesian credible interval. I will have a look at
>> brentq, I do not know what it refers to.
>>    
>
> scipy.optimize.brentq is Brent's method for finding a root of a given scalar
> equation.  Since you are looking for two values, a and b, with two conditions,
> then Brent's method is not appropriate (barring some symmetry-based reduction to
> one variable).  I like to use scipy.optimize.fsolve to find roots of
> multivariable equations, such as
>
> def solve_me(x): # x is an array of the values you are solving for
>     a,b = x
>     integral_error = quad(density, a , b) - q
>     prob_difference = density(b) - density(a)
>     return np.array([integral_error, prob_difference])
>
> fsolve(solve_me, [0.0, 1.0])  # initial guess is a = 0, b = 1
>
>
>
>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>  

from scipy import optimize
from scipy import integrate
import scipy as sp
import numpy as np
from math import exp

Non=15
Noff=15
Ton=10
Toff=50
q=0.95

def PoissonPosterior(s):
    value=0
    s=np.maximum(s,np.zeros(np.shape(s)))
    Norm=0
    for i in range(1,Non+1):
       coeff=(1.+Toff/Ton)**i*sp.factorial(Non+Noff-i-1)/sp.factorial(Non-i)
       Norm+=coeff
       #print coeff,s,Ton,i
       value=coeff*(s*Ton)**(i-1)*np.exp(-s*Ton)/sp.factorial(i-1)
   
    return value*Ton/Norm

   
def solve_me(x,func): # x is an array of the values you are solving for
    a,b = x
    fa=func(a)
    fb=func(b)
    integral_error = integrate.quad(func, a , b)[0] - q
    prob_difference = fb - fa
    print a,b,integral_error, prob_difference
    return np.array([integral_error, prob_difference])

results=optimize.fsolve(solve_me, [1., 1.5],args=(PoissonPosterior))  
print "%d%s credible interval: [%.2f,%.2f]"%(q*100,"%",results[0],results[1])

##################
def density(s):
    if s<0.5:
        return 4*s
    else :
        return 4-4*s
#for testing : results=optimize.fsolve(solve_me, [0.5, 0.5],args=(density))  


# test that the computation of the pdf is correct
from pylab import *
x=np.arange(0,3,0.001)
y=PoissonPosterior(x)
plot(x,y)
show()

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals : help on constrained optimisation schemes?

dmitrey.kroshko
I have tried to solve your problem via openopt.NSLP solver nssolve &
openopt.GLP solver galileo, both give maxResidual=0.885, that is far
from desired zero, so your system (with required non-negative solution)
seems to have no solution.

As for using fsolve or other smooth-funcs intended tools, it (AFAIK) is
senseless wrt non-smooth funcs, like your numerical integration yields.

Regards, D.

Johann Cohen-Tanugi wrote:

> Hello,
> I am attaching my script. I followed Neil's suggestion but it fails to
> converge, due seemingly to issues with the fact that a and b must be
> positive, which I cannot enforce with fsolve, AFAIK.
> I am ready to dive into constrained schemes, especially in openOpt,
> but I was hoping for some suggestions beforehand as to which path to
> follow to solve this problem now.
> I remind that I am trying to find a and b so that :
> integral from a to b of p(x) = Q
> and p(a)=p(b)
> where Q is given (0.95 in my script) and p is a Poisson posterior pdf
> for ON/OFF source experiments. a,b, and x are source rates, and as
> such are positive.
> People will have recognized the computation of a Bayesian credible
> interval here!!
>
> thanks a lot in advance,
> Johann
>
> Neil Martinsen-Burrell wrote:
>> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
>>
>>  
>>> hi Neil, thanks for your answer and sorry I was not clear enough. Of
>>> course I require the 2 conditions. 1) defines *a* credible interval
>>> if p is a posterior pdf; and 2) sets a constraint that for common
>>> situation yield *the* standard Bayesian credible interval. I will
>>> have a look at brentq, I do not know what it refers to.
>>>    
>>
>> scipy.optimize.brentq is Brent's method for finding a root of a given
>> scalar
>> equation.  Since you are looking for two values, a and b, with two
>> conditions,
>> then Brent's method is not appropriate (barring some symmetry-based
>> reduction to
>> one variable).  I like to use scipy.optimize.fsolve to find roots of
>> multivariable equations, such as
>>
>> def solve_me(x): # x is an array of the values you are solving for
>>     a,b = x
>>     integral_error = quad(density, a , b) - q
>>     prob_difference = density(b) - density(a)
>>     return np.array([integral_error, prob_difference])
>>
>> fsolve(solve_me, [0.0, 1.0])  # initial guess is a = 0, b = 1
>>
>>
>>
>>
>> _______________________________________________
>> SciPy-user mailing list
>> [hidden email]
>> http://projects.scipy.org/mailman/listinfo/scipy-user
>>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals : help on constrained optimisation schemes?

Bruce Southey
Hi,
I think that you need an alternative approach here because:

1) The Poisson is a discrete distribution not continuous (so can not use
the solvers).
2) The Poisson is also a skewed distribution so finding points such that
p(a)=p(b) is impossible unless the Poisson parameter is sufficiently large.
3) Depending on the parameters, it is impossible to get exact
probabilities like 5% or 95% from discrete distributions.

These issues probably are reflected in the reported problems.

Depending on your parameter and hence how accurate do you want for your
interval, Normal/Gaussian can provide a suitable approximation.

If you must stay with the Poisson, you need to solve it by brute force
and allow for p(a) != p(b).

Regards
Bruce

dmitrey wrote:

> I have tried to solve your problem via openopt.NSLP solver nssolve &
> openopt.GLP solver galileo, both give maxResidual=0.885, that is far
> from desired zero, so your system (with required non-negative solution)
> seems to have no solution.
>
> As for using fsolve or other smooth-funcs intended tools, it (AFAIK) is
> senseless wrt non-smooth funcs, like your numerical integration yields.
>
> Regards, D.
>
> Johann Cohen-Tanugi wrote:
>  
>> Hello,
>> I am attaching my script. I followed Neil's suggestion but it fails to
>> converge, due seemingly to issues with the fact that a and b must be
>> positive, which I cannot enforce with fsolve, AFAIK.
>> I am ready to dive into constrained schemes, especially in openOpt,
>> but I was hoping for some suggestions beforehand as to which path to
>> follow to solve this problem now.
>> I remind that I am trying to find a and b so that :
>> integral from a to b of p(x) = Q
>> and p(a)=p(b)
>> where Q is given (0.95 in my script) and p is a Poisson posterior pdf
>> for ON/OFF source experiments. a,b, and x are source rates, and as
>> such are positive.
>> People will have recognized the computation of a Bayesian credible
>> interval here!!
>>
>> thanks a lot in advance,
>> Johann
>>
>> Neil Martinsen-Burrell wrote:
>>    
>>> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
>>>
>>>  
>>>      
>>>> hi Neil, thanks for your answer and sorry I was not clear enough. Of
>>>> course I require the 2 conditions. 1) defines *a* credible interval
>>>> if p is a posterior pdf; and 2) sets a constraint that for common
>>>> situation yield *the* standard Bayesian credible interval. I will
>>>> have a look at brentq, I do not know what it refers to.
>>>>    
>>>>        
>>> scipy.optimize.brentq is Brent's method for finding a root of a given
>>> scalar
>>> equation.  Since you are looking for two values, a and b, with two
>>> conditions,
>>> then Brent's method is not appropriate (barring some symmetry-based
>>> reduction to
>>> one variable).  I like to use scipy.optimize.fsolve to find roots of
>>> multivariable equations, such as
>>>
>>> def solve_me(x): # x is an array of the values you are solving for
>>>     a,b = x
>>>     integral_error = quad(density, a , b) - q
>>>     prob_difference = density(b) - density(a)
>>>     return np.array([integral_error, prob_difference])
>>>
>>> fsolve(solve_me, [0.0, 1.0])  # initial guess is a = 0, b = 1
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-user mailing list
>>> [hidden email]
>>> http://projects.scipy.org/mailman/listinfo/scipy-user
>>>  
>>>      
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> SciPy-user mailing list
>> [hidden email]
>> http://projects.scipy.org/mailman/listinfo/scipy-user
>>    
>
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>  

_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Damian Eads-2
In reply to this post by Johann Cohen-Tanugi-2
Johann Cohen-Tanugi wrote:
> Hello,
> The question is a bit more general than that : How can I use SciPy to
> compute the values a and b defined by :
> 1) integral from a to b of a pdf p(x) (or any function for that matter)
> = a given value q
> 2) p(a)=p(b)
>
> thanks in advance,
> Johann

Computing the area of the posterior over a credible interval involves
integration. If your prior and likelihood are conjugate, it might be
easier to use a conjugate distribution parameterized with the posterior
hyperparameters and then compute CDF(b)-CDF(a). See
http://en.wikipedia.org/wiki/Conjugate_prior for a list of conjugate
priors with the hyperparameters worked out.

Now, I guess what your really asking is how to do the inverse of that
(question #1), i.e. how do you find the end points of the interval if
you know the area? Try the inverse CDF or inverse survival function. In
Scipy, some distributions have an isf member function.
    b=post.isf(q)
    a=0.0

Now onto question #2, let's assume your posterior distribution is
symmetric, you can try the inverse CDF or the inverse survival function.
  For example, if q=0.7 (70%) and the posterior is symmetric, then
     L=[post.isf(0.85), post.isf(0.15)]
     a=min(L)
     b=max(L)
Note, post.pdf(a) should be equal to post.pdf(b) because post is symmetric.

Cheers,

Damian Eads
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Johann Cohen-Tanugi-2
Damian, Bruce, Dmitrey,
thank you very much for your answers. I feel I need to be more precise
as to what I am doing. Look at eqs 5.13 and 5.14 of the following article
http://www.astro.cornell.edu/staff/loredo/bayes/promise.pdf
These 2 eqs define the posterior probability I am trying to work with.
Bruce, it is a continuous function of the count rate s, though of course
the underlying Poisson process is discrete. Damian, I will definitely
look into your suggestions to understand better what is available in
that respect, but I am not expecting this pdf to be implemented for cdf
or isf computing. Dmitrey could you send me your calls to the openopt
optimizers, that would be helpful for me to get a headstart on using
your very nice framework and play with other possibilities then fsolve.

As eqs 5.13 and 5.14 show, the resulting pdf is by construction
normalized to 1. In my code, I tried integrate.quad and did not get a
result close to 1. So I think that I have a numerical issue in my python
formula, which would presumably also explain why it fails to converge
(the pdf seems to go down to 0 at low s value much too quickly).

thanks again,
Johann

Damian Eads wrote:

> Johann Cohen-Tanugi wrote:
>  
>> Hello,
>> The question is a bit more general than that : How can I use SciPy to
>> compute the values a and b defined by :
>> 1) integral from a to b of a pdf p(x) (or any function for that matter)
>> = a given value q
>> 2) p(a)=p(b)
>>
>> thanks in advance,
>> Johann
>>    
>
> Computing the area of the posterior over a credible interval involves
> integration. If your prior and likelihood are conjugate, it might be
> easier to use a conjugate distribution parameterized with the posterior
> hyperparameters and then compute CDF(b)-CDF(a). See
> http://en.wikipedia.org/wiki/Conjugate_prior for a list of conjugate
> priors with the hyperparameters worked out.
>
> Now, I guess what your really asking is how to do the inverse of that
> (question #1), i.e. how do you find the end points of the interval if
> you know the area? Try the inverse CDF or inverse survival function. In
> Scipy, some distributions have an isf member function.
>     b=post.isf(q)
>     a=0.0
>
> Now onto question #2, let's assume your posterior distribution is
> symmetric, you can try the inverse CDF or the inverse survival function.
>   For example, if q=0.7 (70%) and the posterior is symmetric, then
>      L=[post.isf(0.85), post.isf(0.15)]
>      a=min(L)
>      b=max(L)
> Note, post.pdf(a) should be equal to post.pdf(b) because post is symmetric.
>
> Cheers,
>
> Damian Eads
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>  
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Anne Archibald
In reply to this post by Johann Cohen-Tanugi-2
2008/5/4 Johann Cohen-Tanugi <[hidden email]>:

>  integral_error = quad(density, a , b) - q
>
>  into
>
>  integral_error = quad(density, a , b)[0] - q
>
>  as quad returns a tuple with the result and the estimated error

This bites me every time I use quad. Still, it's valuable to check the
reported error, though it's not always clear what to do with the
results.

>  More interesting, as a test  I used :
>  q=0.96
>
>  def density(s):
>     if s<0.5:
>         return 4*s
>     else :
>         return 4-4*s
>
>  For which the solution is a=0.1, and b=0.9
>  If I keep
>  results=optimize.fsolve(solve_me, [0, 1])
>  fsolve fails because it wanders in directions a<0 and b>1.
>  With
>  results=optimize.fsolve(solve_me, [0.5, 0.5])
>  fsolve converges without trouble.... Is there a simple way to get it to
>  work with constraints like a>0 and b<1? I am afraid that the answer is
>  no, if I understood well the context of another recent thread entitled
>  "constrained optimization".

Multidimensional solving, as you are discovering, is a pain. If at all
possible it's better to turn it into one-dimensional optimization. It
can also be worth disposing of constraints - at least "open"
constraints - using funny reparameterizations (log, for example, is
great for ensuring that positive things stay positive)

Do you really need p(a)=p(b)? I mean, is this the right supplementary
condition to construct a credible interval? Would it be acceptable to
choose instead p(x<a)=p(x>b)? This will probably be easier to work
with, at least if you can get good numerical behaviour out of your
PDF.

Have you looked into writing down an analytical CDF? It looks
moderately ghastly - all those s**n exp(k*s) - but possibly something
that could be done with a bit of sweat. This would improve the
performance of whatever solution method you use - the roundoff errors
from quad() follow some irregular, discontinuous pattern. If you must
use numerical integration, look into using a non-adaptive Gaussian
quadrature, which won't have that particular problem. (There's one in
scipy for more or less this reason.) Analytical mean, mode, or median
could probably be made use of too. Or a different expansion for small
s. A session with SAGE/MAPLE/Mathematica might yield something.

If you do need p(a)=p(b), perhaps the way to turn the problem
one-dimensional and reduce your headaches is to use a as the sole
parameter and write b as a function of a using the fact that your
distribution is single-peaked (so that p(x)-p(a)==0 has just two
solutions, one of which you know about already; if you feel like being
devious you could do root-finding on (p(x)-p(a))/(x-a) ).

Good luck,
Anne
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Johann Cohen-Tanugi-2
hi Anne, thanks! I will need some time to digest your suggestions and
try to implement them. I think that the condition p(a)=p(b) where p is
the actual pdf and not an integral of it, is the sufficient condition
for unimodal distributions so that the credible interval is uniquely
defined as the interval encompassing the mode.
best,
Johann

Anne Archibald wrote:

> 2008/5/4 Johann Cohen-Tanugi <[hidden email]>:
>
>  
>>  integral_error = quad(density, a , b) - q
>>
>>  into
>>
>>  integral_error = quad(density, a , b)[0] - q
>>
>>  as quad returns a tuple with the result and the estimated error
>>    
>
> This bites me every time I use quad. Still, it's valuable to check the
> reported error, though it's not always clear what to do with the
> results.
>
>  
>>  More interesting, as a test  I used :
>>  q=0.96
>>
>>  def density(s):
>>     if s<0.5:
>>         return 4*s
>>     else :
>>         return 4-4*s
>>
>>  For which the solution is a=0.1, and b=0.9
>>  If I keep
>>  results=optimize.fsolve(solve_me, [0, 1])
>>  fsolve fails because it wanders in directions a<0 and b>1.
>>  With
>>  results=optimize.fsolve(solve_me, [0.5, 0.5])
>>  fsolve converges without trouble.... Is there a simple way to get it to
>>  work with constraints like a>0 and b<1? I am afraid that the answer is
>>  no, if I understood well the context of another recent thread entitled
>>  "constrained optimization".
>>    
>
> Multidimensional solving, as you are discovering, is a pain. If at all
> possible it's better to turn it into one-dimensional optimization. It
> can also be worth disposing of constraints - at least "open"
> constraints - using funny reparameterizations (log, for example, is
> great for ensuring that positive things stay positive)
>
> Do you really need p(a)=p(b)? I mean, is this the right supplementary
> condition to construct a credible interval? Would it be acceptable to
> choose instead p(x<a)=p(x>b)? This will probably be easier to work
> with, at least if you can get good numerical behaviour out of your
> PDF.
>
> Have you looked into writing down an analytical CDF? It looks
> moderately ghastly - all those s**n exp(k*s) - but possibly something
> that could be done with a bit of sweat. This would improve the
> performance of whatever solution method you use - the roundoff errors
> from quad() follow some irregular, discontinuous pattern. If you must
> use numerical integration, look into using a non-adaptive Gaussian
> quadrature, which won't have that particular problem. (There's one in
> scipy for more or less this reason.) Analytical mean, mode, or median
> could probably be made use of too. Or a different expansion for small
> s. A session with SAGE/MAPLE/Mathematica might yield something.
>
> If you do need p(a)=p(b), perhaps the way to turn the problem
> one-dimensional and reduce your headaches is to use a as the sole
> parameter and write b as a function of a using the fact that your
> distribution is single-peaked (so that p(x)-p(a)==0 has just two
> solutions, one of which you know about already; if you feel like being
> devious you could do root-finding on (p(x)-p(a))/(x-a) ).
>
> Good luck,
> Anne
> _______________________________________________
> SciPy-user mailing list
> [hidden email]
> http://projects.scipy.org/mailman/listinfo/scipy-user
>  
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Anne Archibald
2008/5/6 Johann Cohen-Tanugi <[hidden email]>:
> hi Anne, thanks! I will need some time to digest your suggestions and
>  try to implement them. I think that the condition p(a)=p(b) where p is
>  the actual pdf and not an integral of it, is the sufficient condition
>  for unimodal distributions so that the credible interval is uniquely
>  defined as the interval encompassing the mode.

Yes, this should work; I would have gone with an interval that
represented (say) 5th, 50th, and 95th percentiles, bracketing the
median instead of the mode, but that might be peculiar if the
distribution has large tails or you want to compare with a
maximum-likelihood method (which returns the mode, I guess). My actual
experience with Bayesian statistics is pretty limited; it's probably
best to go with whatever's most standard (if the numerics can be made
to behave).


Anne
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Robert Kern-2
In reply to this post by Anne Archibald
On Tue, May 6, 2008 at 2:18 AM, Anne Archibald
<[hidden email]> wrote:
>  Do you really need p(a)=p(b)? I mean, is this the right supplementary
>  condition to construct a credible interval? Would it be acceptable to
>  choose instead p(x<a)=p(x>b)? This will probably be easier to work
>  with, at least if you can get good numerical behaviour out of your
>  PDF.

It's one of the defining characteristics of the kind of credible
interval Johann is looking for. "Bayesian credible interval" is a
somewhat broad designation; it applies to pretty much any interval on
a Baysian posterior distribution as long as the interval is selected
according to some rule that statisticians agree has some kind of
meaning.

In practice, one of the most common such rules is to find the "Highest
Posterior Density" (HPD) interval, where p(a)=p(b) and P(b)-P(a)=0.95
or some such chosen credibility level. Imagine the PDF being flooded
with water up to its peak (let's assume unimodality for now). We
gradually lower the level of the water such that for both points a and
b where the water touches the PDF, p(a)=p(b). We lower the water until
the integral under the "dry" peak is equal to 0.95. Then [a,b] is the
HPD credible interval for that PDF at the 0.95 credibility level.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Johann Cohen-Tanugi-2
mamma mia, shame on me... :(
Of course in my code, it should have read :
 value+=coeff*(s*Ton)**(i-1)/sp.factorial(i-1)
and not
 value=coeff*(s*Ton)**(i-1)/sp.factorial(i-1)
Convergence is much better now with the simple fsolve!

so much for public humiliation. Thanks all for bearing with me :)
I am still interested in testing constrained solver, Dmitrey.

Johann

Robert Kern wrote:

> On Tue, May 6, 2008 at 2:18 AM, Anne Archibald
> <[hidden email]> wrote:
>  
>>  Do you really need p(a)=p(b)? I mean, is this the right supplementary
>>  condition to construct a credible interval? Would it be acceptable to
>>  choose instead p(x<a)=p(x>b)? This will probably be easier to work
>>  with, at least if you can get good numerical behaviour out of your
>>  PDF.
>>    
>
> It's one of the defining characteristics of the kind of credible
> interval Johann is looking for. "Bayesian credible interval" is a
> somewhat broad designation; it applies to pretty much any interval on
> a Baysian posterior distribution as long as the interval is selected
> according to some rule that statisticians agree has some kind of
> meaning.
>
> In practice, one of the most common such rules is to find the "Highest
> Posterior Density" (HPD) interval, where p(a)=p(b) and P(b)-P(a)=0.95
> or some such chosen credibility level. Imagine the PDF being flooded
> with water up to its peak (let's assume unimodality for now). We
> gradually lower the level of the water such that for both points a and
> b where the water touches the PDF, p(a)=p(b). We lower the water until
> the integral under the "dry" peak is equal to 0.95. Then [a,b] is the
> HPD credible interval for that PDF at the 0.95 credibility level.
>
>  
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Bruce Southey
In reply to this post by Robert Kern-2
Hi,
First what do you really mean by p(a) or p(b)?
I would think that you want the definition that Anne defines them and
as also implied by Robert's example.

For this case the posterior is most likely gamma or proportional to
one (it's been a while) which is asymmetric. (Note you will need to
follow the derivation to determine if the constant has been ignored
since it drops out of most calculations.)  So defining the
probabilities as p(x<a) and p(x>b) means that the points a and b are
not generally going to be equidistant from the center of the
distribution (mean, median and mode will give potentially different
but correct answers) and p(x<a) != p(x>b). In this case you must be
able to find the areas of the two tails. Obviously it is considerably
easier and more flexible to use the actual distribution than a vague
formula.

Bruce

On Tue, May 6, 2008 at 2:55 PM, Robert Kern <[hidden email]> wrote:

> On Tue, May 6, 2008 at 2:18 AM, Anne Archibald
>  <[hidden email]> wrote:
>  >  Do you really need p(a)=p(b)? I mean, is this the right supplementary
>  >  condition to construct a credible interval? Would it be acceptable to
>  >  choose instead p(x<a)=p(x>b)? This will probably be easier to work
>  >  with, at least if you can get good numerical behaviour out of your
>  >  PDF.
>
>  It's one of the defining characteristics of the kind of credible
>  interval Johann is looking for. "Bayesian credible interval" is a
>  somewhat broad designation; it applies to pretty much any interval on
>  a Baysian posterior distribution as long as the interval is selected
>  according to some rule that statisticians agree has some kind of
>  meaning.
>
>  In practice, one of the most common such rules is to find the "Highest
>  Posterior Density" (HPD) interval, where p(a)=p(b) and P(b)-P(a)=0.95
>  or some such chosen credibility level. Imagine the PDF being flooded
>  with water up to its peak (let's assume unimodality for now). We
>  gradually lower the level of the water such that for both points a and
>  b where the water touches the PDF, p(a)=p(b). We lower the water until
>  the integral under the "dry" peak is equal to 0.95. Then [a,b] is the
>  HPD credible interval for that PDF at the 0.95 credibility level.
>
>  --
>  Robert Kern
>
>  "I have come to believe that the whole world is an enigma, a harmless
>  enigma that is made terrible by our own mad attempt to interpret it as
>  though it had an underlying truth."
>   -- Umberto Eco
>
>
> _______________________________________________
>  SciPy-user mailing list
>  [hidden email]
>  http://projects.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: computing Bayesian credible intervals

Robert Kern-2
On Tue, May 6, 2008 at 3:47 PM, Bruce Southey <[hidden email]> wrote:
> Hi,
>  First what do you really mean by p(a) or p(b)?

The values of the PDF at points a and b. These are *not*
probabilities, but point values of the density function. Generally
speaking, with an HPD interval, the probabilities under the two tails
(what you are referring to with "p(x<a)" and "p(x>b)") are not equal.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user