[SciPy-User] speed of logpdf functions in scipy.stats

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] speed of logpdf functions in scipy.stats

Brian Blais-2
Hello,

I was playing with some MCMC methods which require logpdf for
different distributions, and thought "hey!  I can use the
scipy.stats.distributions!".  Then, when I tested them, they seemed
slow.  Upon comparison, I noticed a huge speed difference between
these functions and my own not-so-cleverly written python only
functions.  Is there a way to get better performance?  Am I doing
something silly here, and creating unneeded objects somewhere?  The
simplest code which shows the issue is below.

thanks,

bb

--
-----------------

             [hidden email]
             http://web.bryant.edu/~bblais

import numpy as np
from scipy.stats import distributions as D

def lognormalpdf(x,mn,sig):
    # 1/sqrt(2*pi*sigma^2)*exp(-x^2/2/sigma^2)
    return -0.5*log(2*np.pi*sig**2)- (x-mn)**2/sig**2/2.0

x=np.random.rand(5)
print lognormalpdf(x,0,1)
print D.norm.logpdf(x,0,1)

x=np.random.rand(15000)

%timeit y=lognormalpdf(x,0,1)
# 10000 loops, best of 3: 66.9 µs per loop

%timeit y=D.norm.logpdf(x,0,1)
# 1000 loops, best of 3: 727 µs per loop
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: speed of logpdf functions in scipy.stats

josef.pktd
On Fri, Mar 27, 2015 at 7:28 PM, Brian Blais <[hidden email]> wrote:

> Hello,
>
> I was playing with some MCMC methods which require logpdf for
> different distributions, and thought "hey!  I can use the
> scipy.stats.distributions!".  Then, when I tested them, they seemed
> slow.  Upon comparison, I noticed a huge speed difference between
> these functions and my own not-so-cleverly written python only
> functions.  Is there a way to get better performance?  Am I doing
> something silly here, and creating unneeded objects somewhere?  The
> simplest code which shows the issue is below.
>
> thanks,
>
> bb
>
> --
> -----------------
>
>              [hidden email]
>              http://web.bryant.edu/~bblais
>
> import numpy as np
> from scipy.stats import distributions as D
>
> def lognormalpdf(x,mn,sig):
>     # 1/sqrt(2*pi*sigma^2)*exp(-x^2/2/sigma^2)
>     return -0.5*log(2*np.pi*sig**2)- (x-mn)**2/sig**2/2.0
>
> x=np.random.rand(5)
> print lognormalpdf(x,0,1)
> print D.norm.logpdf(x,0,1)
>
> x=np.random.rand(15000)
>
> %timeit y=lognormalpdf(x,0,1)
> # 10000 loops, best of 3: 66.9 µs per loop
>
> %timeit y=D.norm.logpdf(x,0,1)
> # 1000 loops, best of 3: 727 µs per loop

You can compare
norm.logpdf(x,0,1)
and
norm._logpdf(x,0,1)

to get an estimate on the overhead of the argument checking.

The leading underscore method is the distribution specific or generic
implementation that does not include the generic loc scale handling
and argument checking.
If the "private" method is much slower than your implementation, then
it might be inefficiently implemented, or have costly handling of edge
cases.

The overhead of the argument checking and generic loc scale handling
is pretty much unavoidable in the current implementation.


Josef

> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: speed of logpdf functions in scipy.stats

Brian Blais-2
On Fri, Mar 27, 2015 at 8:01 PM,  <[hidden email]> wrote:
> On Fri, Mar 27, 2015 at 7:28 PM, Brian Blais <[hidden email]> wrote:
> You can compare
> norm.logpdf(x,0,1)
> and
> norm._logpdf(x,0,1)
>
> to get an estimate on the overhead of the argument checking.

I thought there would be something like that.  However, when I try

y=D.norm._logpdf(x,0,1)

I get an error: TypeError: _logpdf() takes exactly 2 arguments (4 given)

scipy version 0.15.1, anaconda distribution.

thanks,

bb

--
-----------------

             [hidden email]
             http://web.bryant.edu/~bblais
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: speed of logpdf functions in scipy.stats

Robert Kern-2
On Sat, Mar 28, 2015 at 12:13 AM, Brian Blais <[hidden email]> wrote:

>
> On Fri, Mar 27, 2015 at 8:01 PM,  <[hidden email]> wrote:
> > On Fri, Mar 27, 2015 at 7:28 PM, Brian Blais <[hidden email]> wrote:
> > You can compare
> > norm.logpdf(x,0,1)
> > and
> > norm._logpdf(x,0,1)
> >
> > to get an estimate on the overhead of the argument checking.
>
> I thought there would be something like that.  However, when I try
>
> y=D.norm._logpdf(x,0,1)
>
> I get an error: TypeError: _logpdf() takes exactly 2 arguments (4 given)
>
> scipy version 0.15.1, anaconda distribution.

He meant norm._logpdf(x).

--
Robert Kern

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: speed of logpdf functions in scipy.stats

Brian Blais-2
On Sat, Mar 28, 2015 at 7:35 AM, Robert Kern <[hidden email]> wrote:
>
> He meant norm._logpdf(x).
>

ah, that makes more sense....and it's faster than my python function.
however, this clearly works only in the case of mu=0, sd=1.  for the
normal it's easy to transform, but my goal is to have fast version of
the logpdf's of the different scipy.stats distributions, where each
call may have *different* distribution parameters.  is there a fast
_logpdf-type version for the distributions where you can specify
value, scale, etc...?

thanks,

bb


--
-----------------

             [hidden email]
             http://web.bryant.edu/~bblais
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: speed of logpdf functions in scipy.stats

Robert Kern-2
On Sat, Mar 28, 2015 at 5:01 PM, Brian Blais <[hidden email]> wrote:

>
> On Sat, Mar 28, 2015 at 7:35 AM, Robert Kern <[hidden email]> wrote:
> >
> > He meant norm._logpdf(x).
> >
>
> ah, that makes more sense....and it's faster than my python function.
> however, this clearly works only in the case of mu=0, sd=1.  for the
> normal it's easy to transform, but my goal is to have fast version of
> the logpdf's of the different scipy.stats distributions, where each
> call may have *different* distribution parameters.  is there a fast
> _logpdf-type version for the distributions where you can specify
> value, scale, etc...?

The difference in overhead between norm.logpdf() and norm._logpdf() is mostly just taking care of the location and scale parameters. It's the same for every distribution, which is why we have it organized this way. There is perhaps a little more overhead than strictly necessary because it needs to handle the generic argument manipulation for those distributions that take shape parameters in addition to location and scale. If you want to write a bunch of repetitive boilerplate to do the location and scale manipulation for each distribution separately, that is likely the only way to improve on the overhead.

--
Robert Kern

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: speed of logpdf functions in scipy.stats

josef.pktd
On Sat, Mar 28, 2015 at 1:14 PM, Robert Kern <[hidden email]> wrote:

> On Sat, Mar 28, 2015 at 5:01 PM, Brian Blais <[hidden email]> wrote:
>>
>> On Sat, Mar 28, 2015 at 7:35 AM, Robert Kern <[hidden email]>
>> wrote:
>> >
>> > He meant norm._logpdf(x).
>> >
>>
>> ah, that makes more sense....and it's faster than my python function.
>> however, this clearly works only in the case of mu=0, sd=1.  for the
>> normal it's easy to transform, but my goal is to have fast version of
>> the logpdf's of the different scipy.stats distributions, where each
>> call may have *different* distribution parameters.  is there a fast
>> _logpdf-type version for the distributions where you can specify
>> value, scale, etc...?
>
> The difference in overhead between norm.logpdf() and norm._logpdf() is
> mostly just taking care of the location and scale parameters. It's the same
> for every distribution, which is why we have it organized this way. There is
> perhaps a little more overhead than strictly necessary because it needs to
> handle the generic argument manipulation for those distributions that take
> shape parameters in addition to location and scale. If you want to write a
> bunch of repetitive boilerplate to do the location and scale manipulation
> for each distribution separately, that is likely the only way to improve on
> the overhead.


generic loc scale handling would be just

dist._logpdf((x - loc) / scale) - np.log(scale)

There is also checking whether x is in the domain, which is relevant
in cases with a lower or upper bound, like log-normal, poisson, beta,
besides the generic checking for valid shape parameters.

Josef


>
> --
> Robert Kern
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user