Hello,
I was playing with some MCMC methods which require logpdf for different distributions, and thought "hey! I can use the scipy.stats.distributions!". Then, when I tested them, they seemed slow. Upon comparison, I noticed a huge speed difference between these functions and my own not-so-cleverly written python only functions. Is there a way to get better performance? Am I doing something silly here, and creating unneeded objects somewhere? The simplest code which shows the issue is below. thanks, bb -- ----------------- [hidden email] http://web.bryant.edu/~bblais import numpy as np from scipy.stats import distributions as D def lognormalpdf(x,mn,sig): # 1/sqrt(2*pi*sigma^2)*exp(-x^2/2/sigma^2) return -0.5*log(2*np.pi*sig**2)- (x-mn)**2/sig**2/2.0 x=np.random.rand(5) print lognormalpdf(x,0,1) print D.norm.logpdf(x,0,1) x=np.random.rand(15000) %timeit y=lognormalpdf(x,0,1) # 10000 loops, best of 3: 66.9 µs per loop %timeit y=D.norm.logpdf(x,0,1) # 1000 loops, best of 3: 727 µs per loop _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Fri, Mar 27, 2015 at 7:28 PM, Brian Blais <[hidden email]> wrote:
> Hello, > > I was playing with some MCMC methods which require logpdf for > different distributions, and thought "hey! I can use the > scipy.stats.distributions!". Then, when I tested them, they seemed > slow. Upon comparison, I noticed a huge speed difference between > these functions and my own not-so-cleverly written python only > functions. Is there a way to get better performance? Am I doing > something silly here, and creating unneeded objects somewhere? The > simplest code which shows the issue is below. > > thanks, > > bb > > -- > ----------------- > > [hidden email] > http://web.bryant.edu/~bblais > > import numpy as np > from scipy.stats import distributions as D > > def lognormalpdf(x,mn,sig): > # 1/sqrt(2*pi*sigma^2)*exp(-x^2/2/sigma^2) > return -0.5*log(2*np.pi*sig**2)- (x-mn)**2/sig**2/2.0 > > x=np.random.rand(5) > print lognormalpdf(x,0,1) > print D.norm.logpdf(x,0,1) > > x=np.random.rand(15000) > > %timeit y=lognormalpdf(x,0,1) > # 10000 loops, best of 3: 66.9 µs per loop > > %timeit y=D.norm.logpdf(x,0,1) > # 1000 loops, best of 3: 727 µs per loop You can compare norm.logpdf(x,0,1) and norm._logpdf(x,0,1) to get an estimate on the overhead of the argument checking. The leading underscore method is the distribution specific or generic implementation that does not include the generic loc scale handling and argument checking. If the "private" method is much slower than your implementation, then it might be inefficiently implemented, or have costly handling of edge cases. The overhead of the argument checking and generic loc scale handling is pretty much unavoidable in the current implementation. Josef > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Fri, Mar 27, 2015 at 8:01 PM, <[hidden email]> wrote:
> On Fri, Mar 27, 2015 at 7:28 PM, Brian Blais <[hidden email]> wrote: > You can compare > norm.logpdf(x,0,1) > and > norm._logpdf(x,0,1) > > to get an estimate on the overhead of the argument checking. I thought there would be something like that. However, when I try y=D.norm._logpdf(x,0,1) I get an error: TypeError: _logpdf() takes exactly 2 arguments (4 given) scipy version 0.15.1, anaconda distribution. thanks, bb -- ----------------- [hidden email] http://web.bryant.edu/~bblais _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Sat, Mar 28, 2015 at 12:13 AM, Brian Blais <[hidden email]> wrote:
> > On Fri, Mar 27, 2015 at 8:01 PM, <[hidden email]> wrote: > > On Fri, Mar 27, 2015 at 7:28 PM, Brian Blais <[hidden email]> wrote: > > You can compare > > norm.logpdf(x,0,1) > > and > > norm._logpdf(x,0,1) > > > > to get an estimate on the overhead of the argument checking. > > I thought there would be something like that. However, when I try > > y=D.norm._logpdf(x,0,1) > > I get an error: TypeError: _logpdf() takes exactly 2 arguments (4 given) > > scipy version 0.15.1, anaconda distribution. He meant norm._logpdf(x). -- Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Sat, Mar 28, 2015 at 7:35 AM, Robert Kern <[hidden email]> wrote:
> > He meant norm._logpdf(x). > ah, that makes more sense....and it's faster than my python function. however, this clearly works only in the case of mu=0, sd=1. for the normal it's easy to transform, but my goal is to have fast version of the logpdf's of the different scipy.stats distributions, where each call may have *different* distribution parameters. is there a fast _logpdf-type version for the distributions where you can specify value, scale, etc...? thanks, bb -- ----------------- [hidden email] http://web.bryant.edu/~bblais _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Sat, Mar 28, 2015 at 5:01 PM, Brian Blais <[hidden email]> wrote:
> > On Sat, Mar 28, 2015 at 7:35 AM, Robert Kern <[hidden email]> wrote: > > > > He meant norm._logpdf(x). > > > > ah, that makes more sense....and it's faster than my python function. > however, this clearly works only in the case of mu=0, sd=1. for the > normal it's easy to transform, but my goal is to have fast version of > the logpdf's of the different scipy.stats distributions, where each > call may have *different* distribution parameters. is there a fast > _logpdf-type version for the distributions where you can specify > value, scale, etc...? The difference in overhead between norm.logpdf() and norm._logpdf() is mostly just taking care of the location and scale parameters. It's the same for every distribution, which is why we have it organized this way. There is perhaps a little more overhead than strictly necessary because it needs to handle the generic argument manipulation for those distributions that take shape parameters in addition to location and scale. If you want to write a bunch of repetitive boilerplate to do the location and scale manipulation for each distribution separately, that is likely the only way to improve on the overhead. -- Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Sat, Mar 28, 2015 at 1:14 PM, Robert Kern <[hidden email]> wrote:
> On Sat, Mar 28, 2015 at 5:01 PM, Brian Blais <[hidden email]> wrote: >> >> On Sat, Mar 28, 2015 at 7:35 AM, Robert Kern <[hidden email]> >> wrote: >> > >> > He meant norm._logpdf(x). >> > >> >> ah, that makes more sense....and it's faster than my python function. >> however, this clearly works only in the case of mu=0, sd=1. for the >> normal it's easy to transform, but my goal is to have fast version of >> the logpdf's of the different scipy.stats distributions, where each >> call may have *different* distribution parameters. is there a fast >> _logpdf-type version for the distributions where you can specify >> value, scale, etc...? > > The difference in overhead between norm.logpdf() and norm._logpdf() is > mostly just taking care of the location and scale parameters. It's the same > for every distribution, which is why we have it organized this way. There is > perhaps a little more overhead than strictly necessary because it needs to > handle the generic argument manipulation for those distributions that take > shape parameters in addition to location and scale. If you want to write a > bunch of repetitive boilerplate to do the location and scale manipulation > for each distribution separately, that is likely the only way to improve on > the overhead. generic loc scale handling would be just dist._logpdf((x - loc) / scale) - np.log(scale) There is also checking whether x is in the domain, which is relevant in cases with a lower or upper bound, like log-normal, poisson, beta, besides the generic checking for valid shape parameters. Josef > > -- > Robert Kern > > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |