Quantcast

[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

Nicolas Chopin
 Hi list,
I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: 

 > %timeit x = scipy.stats.norm.rvs(size=1000)
> 10000 loops, best of 3: 49.3 µs per loop

> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> 1000 loops, best of 3: 512 µs per loop

So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller).
 
In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag???

I do realise that instantiating a frozen distribution requires some argument checking and what not, but here it looks too expensive. For my package, this amounts to hours spent on ... tab extensions?  

Anyway, I'd like to ask
(a) is this a known problem? I could not find anything on-line about this.
(b) Is this going to be fixed in some future version of scipy?  
(c) is there a way to fix this with *this* version of scipy using this flag mentioned in the source, and then how? 
(c) or should I instead re-define manually my own distributions objects? (it's really convenient for what I'm trying to do to define distributions as objects with methods rvs, logpdf, and so on).

Many thanks for reading this! :-)
All the best 

 

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

josef.pktd


On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin <[hidden email]> wrote:
 Hi list,
I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: 

 > %timeit x = scipy.stats.norm.rvs(size=1000)
> 10000 loops, best of 3: 49.3 µs per loop

> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> 1000 loops, best of 3: 512 µs per loop

Can you time here just the rvs call and not the instantiation of the frozen distribution.

Frozen distributions have now more overhead in the construction because a new instance of the distribution is created instead of reusing the global instance as in older scipy versions.That might still have an effect in the µs range.
(The reason was to avoid the possibility of spillover of attributes across instances.)

 

So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller).
 
In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag???

I do realise that instantiating a frozen distribution requires some argument checking and what not, but here it looks too expensive. For my package, this amounts to hours spent on ... tab extensions?  

Anyway, I'd like to ask
(a) is this a known problem? I could not find anything on-line about this.
(b) Is this going to be fixed in some future version of scipy?  
(c) is there a way to fix this with *this* version of scipy using this flag mentioned in the source, and then how? 
(c) or should I instead re-define manually my own distributions objects? (it's really convenient for what I'm trying to do to define distributions as objects with methods rvs, logpdf, and so on).

I think we never had any discussion on timing details. Overall, the overhead of scipy.stats.distributions is not relatively small when the underlying calculation is fast, e.g. using numpy.random directly for rvs is quite a bit faster, when the function is available in numpy.

Josef
 

Many thanks for reading this! :-)
All the best 

 

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

Nicolas Chopin
If I time just the rvs call then I get essentially the same time as with
> x = scipy.stats.norm.rvs(size=1000)  

so yes, it's the initialisation of the frozen distribution that costs so much. And, in my case, it seems it adds up to quite a lot.

So what you're saying is that indeed there was recent change that makes frozen dist creation more expensive? so that's "a feature not a bug"? In that case, I will create my own classes. A pity, but well...

Thanks a lot for your prompt answer
Nicolas       

On Fri, 28 Oct 2016 at 19:12 <[hidden email]> wrote:
On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin <[hidden email]> wrote:
 Hi list,
I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: 

 > %timeit x = scipy.stats.norm.rvs(size=1000)
> 10000 loops, best of 3: 49.3 µs per loop

> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> 1000 loops, best of 3: 512 µs per loop

Can you time here just the rvs call and not the instantiation of the frozen distribution.

Frozen distributions have now more overhead in the construction because a new instance of the distribution is created instead of reusing the global instance as in older scipy versions.That might still have an effect in the µs range.
(The reason was to avoid the possibility of spillover of attributes across instances.)

 

So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller).
 
In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag???

I do realise that instantiating a frozen distribution requires some argument checking and what not, but here it looks too expensive. For my package, this amounts to hours spent on ... tab extensions?  

Anyway, I'd like to ask
(a) is this a known problem? I could not find anything on-line about this.
(b) Is this going to be fixed in some future version of scipy?  
(c) is there a way to fix this with *this* version of scipy using this flag mentioned in the source, and then how? 
(c) or should I instead re-define manually my own distributions objects? (it's really convenient for what I'm trying to do to define distributions as objects with methods rvs, logpdf, and so on).

I think we never had any discussion on timing details. Overall, the overhead of scipy.stats.distributions is not relatively small when the underlying calculation is fast, e.g. using numpy.random directly for rvs is quite a bit faster, when the function is available in numpy.

Josef
 

Many thanks for reading this! :-)
All the best 

 

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

Evgeni Burovski
In reply to this post by Nicolas Chopin
On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin <[hidden email]> wrote:

>  Hi list,
> I'm working on a package that does some complicate Monte Carlo experiments.
> The package passes around frozen distributions quite a lot. Trying to
> understand why certain parts were so slow, I did a bit of profiling, and
> stumbled upon this:
>
>  > %timeit x = scipy.stats.norm.rvs(size=1000)
>> 10000 loops, best of 3: 49.3 µs per loop
>
>> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
>> 1000 loops, best of 3: 512 µs per loop
>
> So a x10 penalty when using a frozen dist, even if the size of the simulated
> vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot
> replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04
> (there is a penalty, but it's much smaller).
>
> In the profiler, I can see that a lot of time is spent doing string
> operations (such as expand_tabs) in order to generate the doc. In the
> source, I see that this may depend on a certain -00 flag???
>
> I do realise that instantiating a frozen distribution requires some argument
> checking and what not, but here it looks too expensive. For my package, this
> amounts to hours spent on ... tab extensions?
>
> Anyway, I'd like to ask
> (a) is this a known problem? I could not find anything on-line about this.
> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this flag
> mentioned in the source, and then how?
> (c) or should I instead re-define manually my own distributions objects?
> (it's really convenient for what I'm trying to do to define distributions as
> objects with methods rvs, logpdf, and so on).
>
> Many thanks for reading this! :-)
> All the best


Why are you including the construction time into your timings? Surely,
if you use frozen distributions for some MC work, you're not
recreating frozen instances in hot loops?


In [4]: %timeit norm.rvs(size=100, random_state=123)
The slowest run took 142.68 times longer than the fastest. This could
mean that an intermediate result is being cached.
10000 loops, best of 3: 74.2 µs per loop

In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123)
The slowest run took 4.40 times longer than the fastest. This could
mean that an intermediate result is being cached.
1000 loops, best of 3: 796 µs per loop

In [6]: %timeit dist = norm()
The slowest run took 4.89 times longer than the fastest. This could
mean that an intermediate result is being cached.
1000 loops, best of 3: 672 µs per loop

> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this flag
> mentioned in the source, and then how?

You could of course try reverting
https://github.com/scipy/scipy/pull/3245 for your local copy of scipy.
It went in into scipy 0.14, so this is the likely suspect.
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

josef.pktd
In reply to this post by Nicolas Chopin


On Fri, Oct 28, 2016 at 1:21 PM, Nicolas Chopin <[hidden email]> wrote:
If I time just the rvs call then I get essentially the same time as with
> x = scipy.stats.norm.rvs(size=1000)  

so yes, it's the initialisation of the frozen distribution that costs so much. And, in my case, it seems it adds up to quite a lot.

So what you're saying is that indeed there was recent change that makes frozen dist creation more expensive? so that's "a feature not a bug"? In that case, I will create my own classes. A pity, but well...

Creating a new instance is a feature. It's still possible that there is some speedup possible in the implementation but AFAIR I didn't see anything that would have been obvious (a few mu-s up or down?)

However, given your description that you pass the frozen instances around, you shouldn't be so much instance creation, otherwise you could also use the unfrozen global instance of the distributions.

In general, I avoid scipy.stats.distributions in loops for restricted cases when I don't need the flexibility and input checking, but I don't think it's worth the effort when we would have to replicate most of what's already there.

Josef

 

Thanks a lot for your prompt answer
Nicolas       

On Fri, 28 Oct 2016 at 19:12 <[hidden email]> wrote:
On Fri, Oct 28, 2016 at 12:53 PM, Nicolas Chopin <[hidden email]> wrote:
 Hi list,
I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: 

 > %timeit x = scipy.stats.norm.rvs(size=1000)
> 10000 loops, best of 3: 49.3 µs per loop

> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> 1000 loops, best of 3: 512 µs per loop

Can you time here just the rvs call and not the instantiation of the frozen distribution.

Frozen distributions have now more overhead in the construction because a new instance of the distribution is created instead of reusing the global instance as in older scipy versions.That might still have an effect in the µs range.
(The reason was to avoid the possibility of spillover of attributes across instances.)

 

So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller).
 
In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag???

I do realise that instantiating a frozen distribution requires some argument checking and what not, but here it looks too expensive. For my package, this amounts to hours spent on ... tab extensions?  

Anyway, I'd like to ask
(a) is this a known problem? I could not find anything on-line about this.
(b) Is this going to be fixed in some future version of scipy?  
(c) is there a way to fix this with *this* version of scipy using this flag mentioned in the source, and then how? 
(c) or should I instead re-define manually my own distributions objects? (it's really convenient for what I'm trying to do to define distributions as objects with methods rvs, logpdf, and so on).

I think we never had any discussion on timing details. Overall, the overhead of scipy.stats.distributions is not relatively small when the underlying calculation is fast, e.g. using numpy.random directly for rvs is quite a bit faster, when the function is available in numpy.

Josef
 

Many thanks for reading this! :-)
All the best 

 

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

Nicolas Chopin
In reply to this post by Evgeni Burovski
Yes, as I have just said, I agree that it is the creation of the frozen dist that
explains the difference.

I do need to create a *lot* of frozen distributions, there is no way around that
in what I do. Typically, one run may involve O(10^8) frozen distributions;
for each of these I may either simulate a vector (of size 10^2-10^3), or compute
the log-pdf of a vector of the same size, or both. 

On Fri, 28 Oct 2016 at 19:29 Evgeni Burovski <[hidden email]> wrote:
On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin <[hidden email]> wrote:
>  Hi list,
> I'm working on a package that does some complicate Monte Carlo experiments.
> The package passes around frozen distributions quite a lot. Trying to
> understand why certain parts were so slow, I did a bit of profiling, and
> stumbled upon this:
>
>  > %timeit x = scipy.stats.norm.rvs(size=1000)
>> 10000 loops, best of 3: 49.3 µs per loop
>
>> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
>> 1000 loops, best of 3: 512 µs per loop
>
> So a x10 penalty when using a frozen dist, even if the size of the simulated
> vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot
> replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04
> (there is a penalty, but it's much smaller).
>
> In the profiler, I can see that a lot of time is spent doing string
> operations (such as expand_tabs) in order to generate the doc. In the
> source, I see that this may depend on a certain -00 flag???
>
> I do realise that instantiating a frozen distribution requires some argument
> checking and what not, but here it looks too expensive. For my package, this
> amounts to hours spent on ... tab extensions?
>
> Anyway, I'd like to ask
> (a) is this a known problem? I could not find anything on-line about this.
> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this flag
> mentioned in the source, and then how?
> (c) or should I instead re-define manually my own distributions objects?
> (it's really convenient for what I'm trying to do to define distributions as
> objects with methods rvs, logpdf, and so on).
>
> Many thanks for reading this! :-)
> All the best


Why are you including the construction time into your timings? Surely,
if you use frozen distributions for some MC work, you're not
recreating frozen instances in hot loops?


In [4]: %timeit norm.rvs(size=100, random_state=123)
The slowest run took 142.68 times longer than the fastest. This could
mean that an intermediate result is being cached.
10000 loops, best of 3: 74.2 µs per loop

In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123)
The slowest run took 4.40 times longer than the fastest. This could
mean that an intermediate result is being cached.
1000 loops, best of 3: 796 µs per loop

In [6]: %timeit dist = norm()
The slowest run took 4.89 times longer than the fastest. This could
mean that an intermediate result is being cached.
1000 loops, best of 3: 672 µs per loop

> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this flag
> mentioned in the source, and then how?

You could of course try reverting
https://github.com/scipy/scipy/pull/3245 for your local copy of scipy.
It went in into scipy 0.14, so this is the likely suspect.
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

ralfgommers


On Sat, Oct 29, 2016 at 6:37 AM, Nicolas Chopin <[hidden email]> wrote:
Yes, as I have just said, I agree that it is the creation of the frozen dist that
explains the difference.

I do need to create a *lot* of frozen distributions, there is no way around that
in what I do.

Whatever you can do with frozen distributions you can also do with the regular non-frozen ones, so I doubt that that's true.
 
Typically, one run may involve O(10^8) frozen distributions;
for each of these I may either simulate a vector (of size 10^2-10^3), or compute
the log-pdf of a vector of the same size, or both. 

You haven't explained what's wrong with simply using the rvs() and logpdf() methods from the distribution instances provided in the stats namespace.

Ralf

 

On Fri, 28 Oct 2016 at 19:29 Evgeni Burovski <[hidden email]> wrote:
On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin <[hidden email]> wrote:
>  Hi list,
> I'm working on a package that does some complicate Monte Carlo experiments.
> The package passes around frozen distributions quite a lot. Trying to
> understand why certain parts were so slow, I did a bit of profiling, and
> stumbled upon this:
>
>  > %timeit x = scipy.stats.norm.rvs(size=1000)
>> 10000 loops, best of 3: 49.3 µs per loop
>
>> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
>> 1000 loops, best of 3: 512 µs per loop
>
> So a x10 penalty when using a frozen dist, even if the size of the simulated
> vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot
> replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04
> (there is a penalty, but it's much smaller).
>
> In the profiler, I can see that a lot of time is spent doing string
> operations (such as expand_tabs) in order to generate the doc. In the
> source, I see that this may depend on a certain -00 flag???
>
> I do realise that instantiating a frozen distribution requires some argument
> checking and what not, but here it looks too expensive. For my package, this
> amounts to hours spent on ... tab extensions?
>
> Anyway, I'd like to ask
> (a) is this a known problem? I could not find anything on-line about this.
> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this flag
> mentioned in the source, and then how?
> (c) or should I instead re-define manually my own distributions objects?
> (it's really convenient for what I'm trying to do to define distributions as
> objects with methods rvs, logpdf, and so on).
>
> Many thanks for reading this! :-)
> All the best


Why are you including the construction time into your timings? Surely,
if you use frozen distributions for some MC work, you're not
recreating frozen instances in hot loops?


In [4]: %timeit norm.rvs(size=100, random_state=123)
The slowest run took 142.68 times longer than the fastest. This could
mean that an intermediate result is being cached.
10000 loops, best of 3: 74.2 µs per loop

In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123)
The slowest run took 4.40 times longer than the fastest. This could
mean that an intermediate result is being cached.
1000 loops, best of 3: 796 µs per loop

In [6]: %timeit dist = norm()
The slowest run took 4.89 times longer than the fastest. This could
mean that an intermediate result is being cached.
1000 loops, best of 3: 672 µs per loop

> (b) Is this going to be fixed in some future version of scipy?
> (c) is there a way to fix this with *this* version of scipy using this flag
> mentioned in the source, and then how?

You could of course try reverting
https://github.com/scipy/scipy/pull/3245 for your local copy of scipy.
It went in into scipy 0.14, so this is the likely suspect.
_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

Charles R Harris
In reply to this post by Nicolas Chopin


On Fri, Oct 28, 2016 at 10:53 AM, Nicolas Chopin <[hidden email]> wrote:
 Hi list,
I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: 

 > %timeit x = scipy.stats.norm.rvs(size=1000)
> 10000 loops, best of 3: 49.3 µs per loop

> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> 1000 loops, best of 3: 512 µs per loop

So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller).
 
In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag???

Did you try running with the -OO flag? Anyone know how well that works?

Chuck


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

Nicolas Chopin
hi,
Charles: no, I didn't, I'm not clear how to use this flag? 

Ralf: since you're asking, I may as well give you more details about my stuff. Basically, I'd like to do some basic probabilistic programming: i.e.to give the user the ability to define stochastic models as Python objects; e.g. 

class MarkovChain(object):
   " abstract class " 
    def simulate(T):
        path = []
         for t in range(T):
            path.extend(self.M(path[t-1]))

class RandomWalk(MarkovChain):
    def __init__(self,sigma=1.):
        self.sigma = sigma 
    def M(self,t,xp):
        return stats.norm(loc=xp,scale=self.sigma)

Here, I define a base class for Markov chains, with method simulate that can simulate a trajectory. Then I define a particular (parametric) sub-class, that of Gaussian random walks. 

One part of my package defines an algorithm that takes as an argument such a *class*, generate many possible parameters (above, sigma), and for each parameter, generate trajectories; sometimes the logpdf or the ppf functions must be computed as well. Of course, I could ask the user to provide as an input a function for generating rvs, but then I would need to ask also a function for computing the log-pdf, and so on. 
 
In fact, I have a few ideas (and prototype code) on how to extend frozen distributions so as to do more advanced probabilistic programming, such as: 
* product distributions: 
prod_dist(stats.beta(3,2), norm(loc=3) )
returns an object that corresponds to the distribution of (X,Y), where X~Beta(3,2), Y~N(3,1);
for instance if you apply method rvs, you obtain a [N,2] numpy array
* dict distribution: 
same idea, but returns a record array, (or takes a record array for logpdf, etc)

But I'm not sure there's much interest in extending scipy distributions in this way? 
Best

On Sat, 29 Oct 2016 at 15:06 Charles R Harris <[hidden email]> wrote:
On Fri, Oct 28, 2016 at 10:53 AM, Nicolas Chopin <[hidden email]> wrote:
 Hi list,
I'm working on a package that does some complicate Monte Carlo experiments. The package passes around frozen distributions quite a lot. Trying to understand why certain parts were so slow, I did a bit of profiling, and stumbled upon this: 

 > %timeit x = scipy.stats.norm.rvs(size=1000)
> 10000 loops, best of 3: 49.3 µs per loop

> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
> 1000 loops, best of 3: 512 µs per loop

So a x10 penalty when using a frozen dist, even if the size of the simulated vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot replicate this problem on another machine with scipy 0.13.3 and Ubuntu 14.04 (there is a penalty, but it's much smaller).
 
In the profiler, I can see that a lot of time is spent doing string operations (such as expand_tabs) in order to generate the doc. In the source, I see that this may depend on a certain -00 flag???

Did you try running with the -OO flag? Anyone know how well that works?

Chuck

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

Charles R Harris


On Sat, Oct 29, 2016 at 8:51 AM, Nicolas Chopin <[hidden email]> wrote:
hi,
Charles: no, I didn't, I'm not clear how to use this flag? 

It is passed to cpython and produces *.pyo files without docstrings. I probably doesn't do what you want if the docstrings are dynamically generated (I don't know), but it can be checked if the flag was passed to python so it should be possible to make docstring generation depend on it, and it probably should.

<snip>

Chuck


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Big performance hit when using frozen distributions on scipy 0.16.0

ralfgommers


On Sun, Oct 30, 2016 at 4:49 AM, Charles R Harris <[hidden email]> wrote:


On Sat, Oct 29, 2016 at 8:51 AM, Nicolas Chopin <[hidden email]> wrote:
hi,
Charles: no, I didn't, I'm not clear how to use this flag? 

It is passed to cpython and produces *.pyo files without docstrings. I probably doesn't do what you want if the docstrings are dynamically generated (I don't know),

That is handled by doing docstring manipulation inside ``if __doc__ is None:``

Ralf
 
but it can be checked if the flag was passed to python so it should be possible to make docstring generation depend on it, and it probably should.


_______________________________________________
SciPy-User mailing list
[hidden email]
https://mail.scipy.org/mailman/listinfo/scipy-user
Loading...