Hello,
The question is a bit more general than that : How can I use SciPy to compute the values a and b defined by : 1) integral from a to b of a pdf p(x) (or any function for that matter) = a given value q 2) p(a)=p(b) thanks in advance, Johann _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
> The question is a bit more general than that : How can I use SciPy to > compute the values a and b defined by : > 1) integral from a to b of a pdf p(x) (or any function for that matter) > = a given value q > 2) p(a)=p(b) Problem 1 is under-specified. You have one equation for two unknowns and generally will not be able to solve uniquely for both of the endpoints. It may be common in your field to require some sort of symmetry between a and b, e.g. a = mu + E, b = mu - E where mu is some fixed quantity and now you solve for E rather than a and b separately. I (perhaps naively) would do this in scipy using scipy.optimize.brentq such as form numpy import exp from scipy.optimize import brentq mu = 0 def density(x): return exp(-x) def integral_function(E, mu): return scipy.integrate(density, mu-E, mu+E) - q margin = brentq(integral_function, 0, 1000) a,b = mu - margin, mu+margin For problem 2, I'm not sure what the function p represents, but a suitable adaptation of the above should work. -Neil _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
hi Neil, thanks for your answer and sorry I was not clear enough. Of
course I require the 2 conditions. 1) defines *a* credible interval if p is a posterior pdf; and 2) sets a constraint that for common situation yield *the* standard Bayesian credible interval. I will have a look at brentq, I do not know what it refers to. best, Johann Neil Martinsen-Burrell wrote: > Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes: > > >> The question is a bit more general than that : How can I use SciPy to >> compute the values a and b defined by : >> 1) integral from a to b of a pdf p(x) (or any function for that matter) >> = a given value q >> 2) p(a)=p(b) >> > > Problem 1 is under-specified. You have one equation for two unknowns and > generally will not be able to solve uniquely for both of the endpoints. It may > be common in your field to require some sort of symmetry between a and b, e.g. a > = mu + E, b = mu - E where mu is some fixed quantity and now you solve for E > rather than a and b separately. I (perhaps naively) would do this in scipy > using scipy.optimize.brentq such as > > form numpy import exp > from scipy.optimize import brentq > > mu = 0 > > def density(x): > return exp(-x) > > def integral_function(E, mu): > return scipy.integrate(density, mu-E, mu+E) - q > > margin = brentq(integral_function, 0, 1000) > a,b = mu - margin, mu+margin > > For problem 2, I'm not sure what the function p represents, but a suitable > adaptation of the above should work. > > -Neil > > > > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes:
> hi Neil, thanks for your answer and sorry I was not clear enough. Of > course I require the 2 conditions. 1) defines *a* credible interval if p > is a posterior pdf; and 2) sets a constraint that for common situation > yield *the* standard Bayesian credible interval. I will have a look at > brentq, I do not know what it refers to. scipy.optimize.brentq is Brent's method for finding a root of a given scalar equation. Since you are looking for two values, a and b, with two conditions, then Brent's method is not appropriate (barring some symmetry-based reduction to one variable). I like to use scipy.optimize.fsolve to find roots of multivariable equations, such as def solve_me(x): # x is an array of the values you are solving for a,b = x integral_error = quad(density, a , b) - q prob_difference = density(b) - density(a) return np.array([integral_error, prob_difference]) fsolve(solve_me, [0.0, 1.0]) # initial guess is a = 0, b = 1 _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
hi Neil,
thanks a lot. I needed to modify : integral_error = quad(density, a , b) - q into integral_error = quad(density, a , b)[0] - q as quad returns a tuple with the result and the estimated error More interesting, as a test I used : q=0.96 def density(s): if s<0.5: return 4*s else : return 4-4*s For which the solution is a=0.1, and b=0.9 If I keep results=optimize.fsolve(solve_me, [0, 1]) fsolve fails because it wanders in directions a<0 and b>1. With results=optimize.fsolve(solve_me, [0.5, 0.5]) fsolve converges without trouble.... Is there a simple way to get it to work with constraints like a>0 and b<1? I am afraid that the answer is no, if I understood well the context of another recent thread entitled "constrained optimization". Thanks a lot, Johann Neil Martinsen-Burrell wrote: > Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes: > > >> hi Neil, thanks for your answer and sorry I was not clear enough. Of >> course I require the 2 conditions. 1) defines *a* credible interval if p >> is a posterior pdf; and 2) sets a constraint that for common situation >> yield *the* standard Bayesian credible interval. I will have a look at >> brentq, I do not know what it refers to. >> > > scipy.optimize.brentq is Brent's method for finding a root of a given scalar > equation. Since you are looking for two values, a and b, with two conditions, > then Brent's method is not appropriate (barring some symmetry-based reduction to > one variable). I like to use scipy.optimize.fsolve to find roots of > multivariable equations, such as > > def solve_me(x): # x is an array of the values you are solving for > a,b = x > integral_error = quad(density, a , b) - q > prob_difference = density(b) - density(a) > return np.array([integral_error, prob_difference]) > > fsolve(solve_me, [0.0, 1.0]) # initial guess is a = 0, b = 1 > > > > > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by nmb (Bugzilla)
Hello,
I am attaching my script. I followed Neil's suggestion but it fails to converge, due seemingly to issues with the fact that a and b must be positive, which I cannot enforce with fsolve, AFAIK. I am ready to dive into constrained schemes, especially in openOpt, but I was hoping for some suggestions beforehand as to which path to follow to solve this problem now. I remind that I am trying to find a and b so that : integral from a to b of p(x) = Q and p(a)=p(b) where Q is given (0.95 in my script) and p is a Poisson posterior pdf for ON/OFF source experiments. a,b, and x are source rates, and as such are positive. People will have recognized the computation of a Bayesian credible interval here!! thanks a lot in advance, Johann Neil Martinsen-Burrell wrote: > Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes: > > >> hi Neil, thanks for your answer and sorry I was not clear enough. Of >> course I require the 2 conditions. 1) defines *a* credible interval if p >> is a posterior pdf; and 2) sets a constraint that for common situation >> yield *the* standard Bayesian credible interval. I will have a look at >> brentq, I do not know what it refers to. >> > > scipy.optimize.brentq is Brent's method for finding a root of a given scalar > equation. Since you are looking for two values, a and b, with two conditions, > then Brent's method is not appropriate (barring some symmetry-based reduction to > one variable). I like to use scipy.optimize.fsolve to find roots of > multivariable equations, such as > > def solve_me(x): # x is an array of the values you are solving for > a,b = x > integral_error = quad(density, a , b) - q > prob_difference = density(b) - density(a) > return np.array([integral_error, prob_difference]) > > fsolve(solve_me, [0.0, 1.0]) # initial guess is a = 0, b = 1 > > > > > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > from scipy import optimize from scipy import integrate import scipy as sp import numpy as np from math import exp Non=15 Noff=15 Ton=10 Toff=50 q=0.95 def PoissonPosterior(s): value=0 s=np.maximum(s,np.zeros(np.shape(s))) Norm=0 for i in range(1,Non+1): coeff=(1.+Toff/Ton)**i*sp.factorial(Non+Noff-i-1)/sp.factorial(Non-i) Norm+=coeff #print coeff,s,Ton,i value=coeff*(s*Ton)**(i-1)*np.exp(-s*Ton)/sp.factorial(i-1) return value*Ton/Norm def solve_me(x,func): # x is an array of the values you are solving for a,b = x fa=func(a) fb=func(b) integral_error = integrate.quad(func, a , b)[0] - q prob_difference = fb - fa print a,b,integral_error, prob_difference return np.array([integral_error, prob_difference]) results=optimize.fsolve(solve_me, [1., 1.5],args=(PoissonPosterior)) print "%d%s credible interval: [%.2f,%.2f]"%(q*100,"%",results[0],results[1]) ################## def density(s): if s<0.5: return 4*s else : return 4-4*s #for testing : results=optimize.fsolve(solve_me, [0.5, 0.5],args=(density)) # test that the computation of the pdf is correct from pylab import * x=np.arange(0,3,0.001) y=PoissonPosterior(x) plot(x,y) show() _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
I have tried to solve your problem via openopt.NSLP solver nssolve &
openopt.GLP solver galileo, both give maxResidual=0.885, that is far from desired zero, so your system (with required non-negative solution) seems to have no solution. As for using fsolve or other smooth-funcs intended tools, it (AFAIK) is senseless wrt non-smooth funcs, like your numerical integration yields. Regards, D. Johann Cohen-Tanugi wrote: > Hello, > I am attaching my script. I followed Neil's suggestion but it fails to > converge, due seemingly to issues with the fact that a and b must be > positive, which I cannot enforce with fsolve, AFAIK. > I am ready to dive into constrained schemes, especially in openOpt, > but I was hoping for some suggestions beforehand as to which path to > follow to solve this problem now. > I remind that I am trying to find a and b so that : > integral from a to b of p(x) = Q > and p(a)=p(b) > where Q is given (0.95 in my script) and p is a Poisson posterior pdf > for ON/OFF source experiments. a,b, and x are source rates, and as > such are positive. > People will have recognized the computation of a Bayesian credible > interval here!! > > thanks a lot in advance, > Johann > > Neil Martinsen-Burrell wrote: >> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes: >> >> >>> hi Neil, thanks for your answer and sorry I was not clear enough. Of >>> course I require the 2 conditions. 1) defines *a* credible interval >>> if p is a posterior pdf; and 2) sets a constraint that for common >>> situation yield *the* standard Bayesian credible interval. I will >>> have a look at brentq, I do not know what it refers to. >>> >> >> scipy.optimize.brentq is Brent's method for finding a root of a given >> scalar >> equation. Since you are looking for two values, a and b, with two >> conditions, >> then Brent's method is not appropriate (barring some symmetry-based >> reduction to >> one variable). I like to use scipy.optimize.fsolve to find roots of >> multivariable equations, such as >> >> def solve_me(x): # x is an array of the values you are solving for >> a,b = x >> integral_error = quad(density, a , b) - q >> prob_difference = density(b) - density(a) >> return np.array([integral_error, prob_difference]) >> >> fsolve(solve_me, [0.0, 1.0]) # initial guess is a = 0, b = 1 >> >> >> >> >> _______________________________________________ >> SciPy-user mailing list >> [hidden email] >> http://projects.scipy.org/mailman/listinfo/scipy-user >> > ------------------------------------------------------------------------ > > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Hi,
I think that you need an alternative approach here because: 1) The Poisson is a discrete distribution not continuous (so can not use the solvers). 2) The Poisson is also a skewed distribution so finding points such that p(a)=p(b) is impossible unless the Poisson parameter is sufficiently large. 3) Depending on the parameters, it is impossible to get exact probabilities like 5% or 95% from discrete distributions. These issues probably are reflected in the reported problems. Depending on your parameter and hence how accurate do you want for your interval, Normal/Gaussian can provide a suitable approximation. If you must stay with the Poisson, you need to solve it by brute force and allow for p(a) != p(b). Regards Bruce dmitrey wrote: > I have tried to solve your problem via openopt.NSLP solver nssolve & > openopt.GLP solver galileo, both give maxResidual=0.885, that is far > from desired zero, so your system (with required non-negative solution) > seems to have no solution. > > As for using fsolve or other smooth-funcs intended tools, it (AFAIK) is > senseless wrt non-smooth funcs, like your numerical integration yields. > > Regards, D. > > Johann Cohen-Tanugi wrote: > >> Hello, >> I am attaching my script. I followed Neil's suggestion but it fails to >> converge, due seemingly to issues with the fact that a and b must be >> positive, which I cannot enforce with fsolve, AFAIK. >> I am ready to dive into constrained schemes, especially in openOpt, >> but I was hoping for some suggestions beforehand as to which path to >> follow to solve this problem now. >> I remind that I am trying to find a and b so that : >> integral from a to b of p(x) = Q >> and p(a)=p(b) >> where Q is given (0.95 in my script) and p is a Poisson posterior pdf >> for ON/OFF source experiments. a,b, and x are source rates, and as >> such are positive. >> People will have recognized the computation of a Bayesian credible >> interval here!! >> >> thanks a lot in advance, >> Johann >> >> Neil Martinsen-Burrell wrote: >> >>> Johann Cohen-Tanugi <cohen <at> slac.stanford.edu> writes: >>> >>> >>> >>>> hi Neil, thanks for your answer and sorry I was not clear enough. Of >>>> course I require the 2 conditions. 1) defines *a* credible interval >>>> if p is a posterior pdf; and 2) sets a constraint that for common >>>> situation yield *the* standard Bayesian credible interval. I will >>>> have a look at brentq, I do not know what it refers to. >>>> >>>> >>> scipy.optimize.brentq is Brent's method for finding a root of a given >>> scalar >>> equation. Since you are looking for two values, a and b, with two >>> conditions, >>> then Brent's method is not appropriate (barring some symmetry-based >>> reduction to >>> one variable). I like to use scipy.optimize.fsolve to find roots of >>> multivariable equations, such as >>> >>> def solve_me(x): # x is an array of the values you are solving for >>> a,b = x >>> integral_error = quad(density, a , b) - q >>> prob_difference = density(b) - density(a) >>> return np.array([integral_error, prob_difference]) >>> >>> fsolve(solve_me, [0.0, 1.0]) # initial guess is a = 0, b = 1 >>> >>> >>> >>> >>> _______________________________________________ >>> SciPy-user mailing list >>> [hidden email] >>> http://projects.scipy.org/mailman/listinfo/scipy-user >>> >>> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> SciPy-user mailing list >> [hidden email] >> http://projects.scipy.org/mailman/listinfo/scipy-user >> > > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > > _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Johann Cohen-Tanugi-2
Johann Cohen-Tanugi wrote:
> Hello, > The question is a bit more general than that : How can I use SciPy to > compute the values a and b defined by : > 1) integral from a to b of a pdf p(x) (or any function for that matter) > = a given value q > 2) p(a)=p(b) > > thanks in advance, > Johann Computing the area of the posterior over a credible interval involves integration. If your prior and likelihood are conjugate, it might be easier to use a conjugate distribution parameterized with the posterior hyperparameters and then compute CDF(b)-CDF(a). See http://en.wikipedia.org/wiki/Conjugate_prior for a list of conjugate priors with the hyperparameters worked out. Now, I guess what your really asking is how to do the inverse of that (question #1), i.e. how do you find the end points of the interval if you know the area? Try the inverse CDF or inverse survival function. In Scipy, some distributions have an isf member function. b=post.isf(q) a=0.0 Now onto question #2, let's assume your posterior distribution is symmetric, you can try the inverse CDF or the inverse survival function. For example, if q=0.7 (70%) and the posterior is symmetric, then L=[post.isf(0.85), post.isf(0.15)] a=min(L) b=max(L) Note, post.pdf(a) should be equal to post.pdf(b) because post is symmetric. Cheers, Damian Eads _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Damian, Bruce, Dmitrey,
thank you very much for your answers. I feel I need to be more precise as to what I am doing. Look at eqs 5.13 and 5.14 of the following article http://www.astro.cornell.edu/staff/loredo/bayes/promise.pdf These 2 eqs define the posterior probability I am trying to work with. Bruce, it is a continuous function of the count rate s, though of course the underlying Poisson process is discrete. Damian, I will definitely look into your suggestions to understand better what is available in that respect, but I am not expecting this pdf to be implemented for cdf or isf computing. Dmitrey could you send me your calls to the openopt optimizers, that would be helpful for me to get a headstart on using your very nice framework and play with other possibilities then fsolve. As eqs 5.13 and 5.14 show, the resulting pdf is by construction normalized to 1. In my code, I tried integrate.quad and did not get a result close to 1. So I think that I have a numerical issue in my python formula, which would presumably also explain why it fails to converge (the pdf seems to go down to 0 at low s value much too quickly). thanks again, Johann Damian Eads wrote: > Johann Cohen-Tanugi wrote: > >> Hello, >> The question is a bit more general than that : How can I use SciPy to >> compute the values a and b defined by : >> 1) integral from a to b of a pdf p(x) (or any function for that matter) >> = a given value q >> 2) p(a)=p(b) >> >> thanks in advance, >> Johann >> > > Computing the area of the posterior over a credible interval involves > integration. If your prior and likelihood are conjugate, it might be > easier to use a conjugate distribution parameterized with the posterior > hyperparameters and then compute CDF(b)-CDF(a). See > http://en.wikipedia.org/wiki/Conjugate_prior for a list of conjugate > priors with the hyperparameters worked out. > > Now, I guess what your really asking is how to do the inverse of that > (question #1), i.e. how do you find the end points of the interval if > you know the area? Try the inverse CDF or inverse survival function. In > Scipy, some distributions have an isf member function. > b=post.isf(q) > a=0.0 > > Now onto question #2, let's assume your posterior distribution is > symmetric, you can try the inverse CDF or the inverse survival function. > For example, if q=0.7 (70%) and the posterior is symmetric, then > L=[post.isf(0.85), post.isf(0.15)] > a=min(L) > b=max(L) > Note, post.pdf(a) should be equal to post.pdf(b) because post is symmetric. > > Cheers, > > Damian Eads > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Johann Cohen-Tanugi-2
2008/5/4 Johann Cohen-Tanugi <[hidden email]>:
> integral_error = quad(density, a , b) - q > > into > > integral_error = quad(density, a , b)[0] - q > > as quad returns a tuple with the result and the estimated error This bites me every time I use quad. Still, it's valuable to check the reported error, though it's not always clear what to do with the results. > More interesting, as a test I used : > q=0.96 > > def density(s): > if s<0.5: > return 4*s > else : > return 4-4*s > > For which the solution is a=0.1, and b=0.9 > If I keep > results=optimize.fsolve(solve_me, [0, 1]) > fsolve fails because it wanders in directions a<0 and b>1. > With > results=optimize.fsolve(solve_me, [0.5, 0.5]) > fsolve converges without trouble.... Is there a simple way to get it to > work with constraints like a>0 and b<1? I am afraid that the answer is > no, if I understood well the context of another recent thread entitled > "constrained optimization". Multidimensional solving, as you are discovering, is a pain. If at all possible it's better to turn it into one-dimensional optimization. It can also be worth disposing of constraints - at least "open" constraints - using funny reparameterizations (log, for example, is great for ensuring that positive things stay positive) Do you really need p(a)=p(b)? I mean, is this the right supplementary condition to construct a credible interval? Would it be acceptable to choose instead p(x<a)=p(x>b)? This will probably be easier to work with, at least if you can get good numerical behaviour out of your PDF. Have you looked into writing down an analytical CDF? It looks moderately ghastly - all those s**n exp(k*s) - but possibly something that could be done with a bit of sweat. This would improve the performance of whatever solution method you use - the roundoff errors from quad() follow some irregular, discontinuous pattern. If you must use numerical integration, look into using a non-adaptive Gaussian quadrature, which won't have that particular problem. (There's one in scipy for more or less this reason.) Analytical mean, mode, or median could probably be made use of too. Or a different expansion for small s. A session with SAGE/MAPLE/Mathematica might yield something. If you do need p(a)=p(b), perhaps the way to turn the problem one-dimensional and reduce your headaches is to use a as the sole parameter and write b as a function of a using the fact that your distribution is single-peaked (so that p(x)-p(a)==0 has just two solutions, one of which you know about already; if you feel like being devious you could do root-finding on (p(x)-p(a))/(x-a) ). Good luck, Anne _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
hi Anne, thanks! I will need some time to digest your suggestions and
try to implement them. I think that the condition p(a)=p(b) where p is the actual pdf and not an integral of it, is the sufficient condition for unimodal distributions so that the credible interval is uniquely defined as the interval encompassing the mode. best, Johann Anne Archibald wrote: > 2008/5/4 Johann Cohen-Tanugi <[hidden email]>: > > >> integral_error = quad(density, a , b) - q >> >> into >> >> integral_error = quad(density, a , b)[0] - q >> >> as quad returns a tuple with the result and the estimated error >> > > This bites me every time I use quad. Still, it's valuable to check the > reported error, though it's not always clear what to do with the > results. > > >> More interesting, as a test I used : >> q=0.96 >> >> def density(s): >> if s<0.5: >> return 4*s >> else : >> return 4-4*s >> >> For which the solution is a=0.1, and b=0.9 >> If I keep >> results=optimize.fsolve(solve_me, [0, 1]) >> fsolve fails because it wanders in directions a<0 and b>1. >> With >> results=optimize.fsolve(solve_me, [0.5, 0.5]) >> fsolve converges without trouble.... Is there a simple way to get it to >> work with constraints like a>0 and b<1? I am afraid that the answer is >> no, if I understood well the context of another recent thread entitled >> "constrained optimization". >> > > Multidimensional solving, as you are discovering, is a pain. If at all > possible it's better to turn it into one-dimensional optimization. It > can also be worth disposing of constraints - at least "open" > constraints - using funny reparameterizations (log, for example, is > great for ensuring that positive things stay positive) > > Do you really need p(a)=p(b)? I mean, is this the right supplementary > condition to construct a credible interval? Would it be acceptable to > choose instead p(x<a)=p(x>b)? This will probably be easier to work > with, at least if you can get good numerical behaviour out of your > PDF. > > Have you looked into writing down an analytical CDF? It looks > moderately ghastly - all those s**n exp(k*s) - but possibly something > that could be done with a bit of sweat. This would improve the > performance of whatever solution method you use - the roundoff errors > from quad() follow some irregular, discontinuous pattern. If you must > use numerical integration, look into using a non-adaptive Gaussian > quadrature, which won't have that particular problem. (There's one in > scipy for more or less this reason.) Analytical mean, mode, or median > could probably be made use of too. Or a different expansion for small > s. A session with SAGE/MAPLE/Mathematica might yield something. > > If you do need p(a)=p(b), perhaps the way to turn the problem > one-dimensional and reduce your headaches is to use a as the sole > parameter and write b as a function of a using the fact that your > distribution is single-peaked (so that p(x)-p(a)==0 has just two > solutions, one of which you know about already; if you feel like being > devious you could do root-finding on (p(x)-p(a))/(x-a) ). > > Good luck, > Anne > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
2008/5/6 Johann Cohen-Tanugi <[hidden email]>:
> hi Anne, thanks! I will need some time to digest your suggestions and > try to implement them. I think that the condition p(a)=p(b) where p is > the actual pdf and not an integral of it, is the sufficient condition > for unimodal distributions so that the credible interval is uniquely > defined as the interval encompassing the mode. Yes, this should work; I would have gone with an interval that represented (say) 5th, 50th, and 95th percentiles, bracketing the median instead of the mode, but that might be peculiar if the distribution has large tails or you want to compare with a maximum-likelihood method (which returns the mode, I guess). My actual experience with Bayesian statistics is pretty limited; it's probably best to go with whatever's most standard (if the numerics can be made to behave). Anne _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Anne Archibald
On Tue, May 6, 2008 at 2:18 AM, Anne Archibald
<[hidden email]> wrote: > Do you really need p(a)=p(b)? I mean, is this the right supplementary > condition to construct a credible interval? Would it be acceptable to > choose instead p(x<a)=p(x>b)? This will probably be easier to work > with, at least if you can get good numerical behaviour out of your > PDF. It's one of the defining characteristics of the kind of credible interval Johann is looking for. "Bayesian credible interval" is a somewhat broad designation; it applies to pretty much any interval on a Baysian posterior distribution as long as the interval is selected according to some rule that statisticians agree has some kind of meaning. In practice, one of the most common such rules is to find the "Highest Posterior Density" (HPD) interval, where p(a)=p(b) and P(b)-P(a)=0.95 or some such chosen credibility level. Imagine the PDF being flooded with water up to its peak (let's assume unimodality for now). We gradually lower the level of the water such that for both points a and b where the water touches the PDF, p(a)=p(b). We lower the water until the integral under the "dry" peak is equal to 0.95. Then [a,b] is the HPD credible interval for that PDF at the 0.95 credibility level. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
mamma mia, shame on me... :(
Of course in my code, it should have read : value+=coeff*(s*Ton)**(i-1)/sp.factorial(i-1) and not value=coeff*(s*Ton)**(i-1)/sp.factorial(i-1) Convergence is much better now with the simple fsolve! so much for public humiliation. Thanks all for bearing with me :) I am still interested in testing constrained solver, Dmitrey. Johann Robert Kern wrote: > On Tue, May 6, 2008 at 2:18 AM, Anne Archibald > <[hidden email]> wrote: > >> Do you really need p(a)=p(b)? I mean, is this the right supplementary >> condition to construct a credible interval? Would it be acceptable to >> choose instead p(x<a)=p(x>b)? This will probably be easier to work >> with, at least if you can get good numerical behaviour out of your >> PDF. >> > > It's one of the defining characteristics of the kind of credible > interval Johann is looking for. "Bayesian credible interval" is a > somewhat broad designation; it applies to pretty much any interval on > a Baysian posterior distribution as long as the interval is selected > according to some rule that statisticians agree has some kind of > meaning. > > In practice, one of the most common such rules is to find the "Highest > Posterior Density" (HPD) interval, where p(a)=p(b) and P(b)-P(a)=0.95 > or some such chosen credibility level. Imagine the PDF being flooded > with water up to its peak (let's assume unimodality for now). We > gradually lower the level of the water such that for both points a and > b where the water touches the PDF, p(a)=p(b). We lower the water until > the integral under the "dry" peak is equal to 0.95. Then [a,b] is the > HPD credible interval for that PDF at the 0.95 credibility level. > > SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Robert Kern-2
Hi,
First what do you really mean by p(a) or p(b)? I would think that you want the definition that Anne defines them and as also implied by Robert's example. For this case the posterior is most likely gamma or proportional to one (it's been a while) which is asymmetric. (Note you will need to follow the derivation to determine if the constant has been ignored since it drops out of most calculations.) So defining the probabilities as p(x<a) and p(x>b) means that the points a and b are not generally going to be equidistant from the center of the distribution (mean, median and mode will give potentially different but correct answers) and p(x<a) != p(x>b). In this case you must be able to find the areas of the two tails. Obviously it is considerably easier and more flexible to use the actual distribution than a vague formula. Bruce On Tue, May 6, 2008 at 2:55 PM, Robert Kern <[hidden email]> wrote: > On Tue, May 6, 2008 at 2:18 AM, Anne Archibald > <[hidden email]> wrote: > > Do you really need p(a)=p(b)? I mean, is this the right supplementary > > condition to construct a credible interval? Would it be acceptable to > > choose instead p(x<a)=p(x>b)? This will probably be easier to work > > with, at least if you can get good numerical behaviour out of your > > PDF. > > It's one of the defining characteristics of the kind of credible > interval Johann is looking for. "Bayesian credible interval" is a > somewhat broad designation; it applies to pretty much any interval on > a Baysian posterior distribution as long as the interval is selected > according to some rule that statisticians agree has some kind of > meaning. > > In practice, one of the most common such rules is to find the "Highest > Posterior Density" (HPD) interval, where p(a)=p(b) and P(b)-P(a)=0.95 > or some such chosen credibility level. Imagine the PDF being flooded > with water up to its peak (let's assume unimodality for now). We > gradually lower the level of the water such that for both points a and > b where the water touches the PDF, p(a)=p(b). We lower the water until > the integral under the "dry" peak is equal to 0.95. Then [a,b] is the > HPD credible interval for that PDF at the 0.95 credibility level. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > > > _______________________________________________ > SciPy-user mailing list > [hidden email] > http://projects.scipy.org/mailman/listinfo/scipy-user > SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
On Tue, May 6, 2008 at 3:47 PM, Bruce Southey <[hidden email]> wrote:
> Hi, > First what do you really mean by p(a) or p(b)? The values of the PDF at points a and b. These are *not* probabilities, but point values of the density function. Generally speaking, with an HPD interval, the probabilities under the two tails (what you are referring to with "p(x<a)" and "p(x>b)") are not equal. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |