Speed-up simple function

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Speed-up simple function

Giovanni Plantageneto

Hi everybody,

I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions?

Thanks,
Andrea
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Nicolau Werneck
I would suggest you to write a small piece of code using either scipy
weave or Cython.

I have been using Cython a lot, and it really pays off. But if you
only have a simple expression such as a polynomial to calculate, and
if something simpler such as weave works for you, it might be the best
choice.

++nic


On Mon, Jan 10, 2011 at 02:59:28PM +0100, [hidden email] wrote:

>
> Hi everybody,
>
> I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions?
>
> Thanks,
> Andrea
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user

--
Nicolau Werneck <[hidden email]>          C3CF E29F 5350 5DAA 3705
http://www.lti.pcs.usp.br/~nwerneck           7B9E D6C4 37BB DA64 6F15
Linux user #460716
"I never claimed infallibility. You have to leave that to experts in that field like the Pope."
-- Edsger Dijkstra

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

J. David Lee
In reply to this post by Giovanni Plantageneto
Andrea,

Here is an example of scipy.weave that I generally use as a starting point:

from scipy.weave import inline
from scipy.weave.converters import blitz
from pylab import *

def foo(arr, c):
     ret = arr.copy()
     cvars = ['ret', 'c']
     # Using blitz converter gives the following for an array arr:
     #    Narr[0] is length of first dimension
     #    Narr[1] is length of second dimension
     #    arr(a, b) is the element arr[a][b]
     code = """
         int i = Nret[0];
         while(i--) {
             ret(i) += c;
         }
     """
     inline(code, cvars, type_converters = blitz)
     return ret

a = arange(5)
foo(a, 1)
print a

I generally create all my python objects outside the code so I don't
screw up any reference counting.  One caveat is that you need to run the
code in a single process the first time around or you can end up with
some strange errors.

David

On 01/10/2011 07:59 AM, [hidden email] wrote:

> Hi everybody,
>
> I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions?
>
> Thanks,
> Andrea
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Charles R Harris
In reply to this post by Giovanni Plantageneto


On Mon, Jan 10, 2011 at 6:59 AM, <[hidden email]> wrote:

Hi everybody,

I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions?


Can you be more specific? If it's just one polynomial you can compute is all in one go. If it is a different polynomial for each array element you probably need to use Cython or some such. And what do you mean by "long" polynomial?

Chuck 


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Emanuele Olivetti-3
In reply to this post by Giovanni Plantageneto
On 01/10/2011 02:59 PM, [hidden email] wrote:
> Hi everybody,
>
> I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions?
>

Hi,

Could you give us some more details? If the issue is the large number of polynomial
evaluations I believe that it is already fast in numpy. Eg., 10^5 evaluations
of a pretty long polynomial (100 terms):

In [18]: %timeit np.polyval(np.linspace(-1,1,100),np.linspace(0,5,100000))
1 loops, best of 3: 360 ms per loop

Best,

Emanuele


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Giovanni Plantageneto

Thanks for the quick answers. To be more specific, the slow part is something like this (see comments in the code):

for tstep in xrange(0,end):     # I need this cycle otherwise I run out of memory
        VAR1     = file.variables['VAR1'][tstep,:,:,:] #Read from a NetCDF file
        VAR2     = file.variables['VAR2'][tstep,:,:,:] #Read another variable from the same NetCDF file
        # Use the data read above in two functions. I do also some reshaping to have the correct
        # input for the function. The polynomials computed in the functions are of order 10 at most.
        COMP1  = FUNC1(VAR1,VAR2,np.tile(5000,VAR1.shape),np.reshape(np.repeat(SOMETHING,nrep),VAR1.shape))
        COMP2  = FUNC2(VAR1,COMP1,np.tile(5000,VAR1.shape))
        #Compute an average on the output of previous computations
        RESULT[tstep] = np.average(COMP2,weights=W)

Thanks for any suggestion,
Andrea


----- Start Original Message -----
Sent: Mon, 10 Jan 2011 15:33:13 +0100
From: Emanuele Olivetti <[hidden email]>
To: [hidden email], SciPy Users List <[hidden email]>
Subject: Re: [SciPy-User] Speed-up simple function

> On 01/10/2011 02:59 PM, [hidden email] wrote:
> > Hi everybody,
> >
> > I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions?
> >
>
> Hi,
>
> Could you give us some more details? If the issue is the large number of polynomial
> evaluations I believe that it is already fast in numpy. Eg., 10^5 evaluations
> of a pretty long polynomial (100 terms):
>
> In [18]: %timeit np.polyval(np.linspace(-1,1,100),np.linspace(0,5,100000))
> 1 loops, best of 3: 360 ms per loop
>
> Best,
>
> Emanuele

----- End Original Message -----
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Charles R Harris


On Mon, Jan 10, 2011 at 7:45 AM, <[hidden email]> wrote:

Thanks for the quick answers. To be more specific, the slow part is something like this (see comments in the code):

for tstep in xrange(0,end):     # I need this cycle otherwise I run out of memory
       VAR1     = file.variables['VAR1'][tstep,:,:,:] #Read from a NetCDF file
       VAR2     = file.variables['VAR2'][tstep,:,:,:] #Read another variable from the same NetCDF file
       # Use the data read above in two functions. I do also some reshaping to have the correct
       # input for the function. The polynomials computed in the functions are of order 10 at most.
       COMP1  = FUNC1(VAR1,VAR2,np.tile(5000,VAR1.shape),np.reshape(np.repeat(SOMETHING,nrep),VAR1.shape))
       COMP2  = FUNC2(VAR1,COMP1,np.tile(5000,VAR1.shape))
       #Compute an average on the output of previous computations
       RESULT[tstep] = np.average(COMP2,weights=W)


This looks terribly over complicated. What are you trying to do and where is the polynomial?

<snip>

Chuck


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Giovanni Plantageneto

> >
> This looks terribly over complicated. What are you trying to do and where is
> the polynomial?
>

I see, sorry, I'll try to put it more clearly. My computation is something like this: for each time step, I compute the value of some polynomial on all array elements, then I average the results to obtain a single number:

###############################################

for tstep in xrange(0,end):
       VAR1     = Read from some NetCDF file (~10^5 elements)
       VAR2     = Read from some NetCDF file (~10^5 elements)
       COMP1  = FUNC1(VAR1,VAR2)
       COMP2  = FUNC2(VAR1,COMP1)
       RESULT[tstep] = np.average(COMP2,weights=W)

###############################################

I checked and the bottleneck is really in the computations done by FUNC1 and FUNC2.
The polynomials are in the functions FUNC1 and FUNC2 (from python seawater library, a library that provides some standard functions for seawater properties). As an example (they are all rather similar, just polynomials), one of the functions is:

###########################################################

def _dens0(S,T):
    """Density of seawater at zero pressure"""

    # --- Define constants ---
    a0 = 999.842594
    a1 =   6.793952e-2
    a2 =  -9.095290e-3
    a3 =   1.001685e-4
    a4 =  -1.120083e-6
    a5 =   6.536332e-9

    b0 =   8.24493e-1
    b1 =  -4.0899e-3
    b2 =   7.6438e-5
    b3 =  -8.2467e-7
    b4 =   5.3875e-9

    c0 =  -5.72466e-3
    c1 =   1.0227e-4
    c2 =  -1.6546e-6

    d0 =   4.8314e-4

    # --- Computations ---
    # Density of pure water
    SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T

    # More temperature polynomials
    RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T
    RC = c0 + (c1 + c2*T)*T

    return SMOW + RB*S + RC*(S**1.5) + d0*S*S

####################################################
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Nicolau Werneck
Hi. I tried this function in my Atom N450 computer, with two vectors
with 100.000 numbers, and it took only 90ms. Do you really need it
faster than this? Do you have to make these calculations to a lot of
different 100k long vectors, or something like that?...

If you really need to speed that up, you just have to implement this
_dens0 function using weave, Cython or something like that. I can give
you some more info about Cython, write me at my address if you want.

++nic



On Mon, Jan 10, 2011 at 04:47:44PM +0100, [hidden email] wrote:

>
> > >
> > This looks terribly over complicated. What are you trying to do and where is
> > the polynomial?
> >
>
> I see, sorry, I'll try to put it more clearly. My computation is something like this: for each time step, I compute the value of some polynomial on all array elements, then I average the results to obtain a single number:
>
> ###############################################
>
> for tstep in xrange(0,end):
>        VAR1     = Read from some NetCDF file (~10^5 elements)
>        VAR2     = Read from some NetCDF file (~10^5 elements)
>        COMP1  = FUNC1(VAR1,VAR2)
>        COMP2  = FUNC2(VAR1,COMP1)
>        RESULT[tstep] = np.average(COMP2,weights=W)
>
> ###############################################
>
> I checked and the bottleneck is really in the computations done by FUNC1 and FUNC2.
> The polynomials are in the functions FUNC1 and FUNC2 (from python seawater library, a library that provides some standard functions for seawater properties). As an example (they are all rather similar, just polynomials), one of the functions is:
>
> ###########################################################
>
> def _dens0(S,T):
>     """Density of seawater at zero pressure"""
>
>     # --- Define constants ---
>     a0 = 999.842594
>     a1 =   6.793952e-2
>     a2 =  -9.095290e-3
>     a3 =   1.001685e-4
>     a4 =  -1.120083e-6
>     a5 =   6.536332e-9
>
>     b0 =   8.24493e-1
>     b1 =  -4.0899e-3
>     b2 =   7.6438e-5
>     b3 =  -8.2467e-7
>     b4 =   5.3875e-9
>
>     c0 =  -5.72466e-3
>     c1 =   1.0227e-4
>     c2 =  -1.6546e-6
>
>     d0 =   4.8314e-4
>
>     # --- Computations ---
>     # Density of pure water
>     SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T
>
>     # More temperature polynomials
>     RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T
>     RC = c0 + (c1 + c2*T)*T
>
>     return SMOW + RB*S + RC*(S**1.5) + d0*S*S
>
> ####################################################
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user

--
Nicolau Werneck <[hidden email]>          C3CF E29F 5350 5DAA 3705
http://www.lti.pcs.usp.br/~nwerneck           7B9E D6C4 37BB DA64 6F15
Linux user #460716
"A language that doesn't affect the way you think about programming, is not worth knowing."
-- Alan J. Perlis

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Francesc Alted-2
In reply to this post by Giovanni Plantageneto
A Monday 10 January 2011 16:47:44 [hidden email] escrigué:
> > This looks terribly over complicated. What are you trying to do and
> > where is the polynomial?
>
> I see, sorry, I'll try to put it more clearly. My computation is
> something like this: for each time step, I compute the value of some
> polynomial on all array elements, then I average the results to
> obtain a single number:
[clip]

As others have said, Cython is good option.  Perhaps even easier would
be Numexpr, that let's you get nice speed-ups on complex expressions,
and as a plus, it lets you use any number of cores that your machine
has.

I've made a small demo (attached) on how Numexpr works based on your
polynomial.  Here are my results of running it on a 6-core machine:

Computing with NumPy with 10^5 elements:
time spent: 0.0169
Computing with Numexpr with 10^5 elements:
time spent for 1 threads: 0.0052 speed-up: 3.22
time spent for 2 threads: 0.0028 speed-up: 6.11
time spent for 3 threads: 0.0022 speed-up: 7.82
time spent for 4 threads: 0.0016 speed-up: 10.26
time spent for 5 threads: 0.0014 speed-up: 12.31
time spent for 6 threads: 0.0013 speed-up: 13.35

Cheers,

--
Francesc Alted

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

test.py (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Nicolau Werneck
That is one excellent suggestion, using multiple cores is probably the
best thing to do. I have tried a basic Cython implementation (because
it's such a basic and interesting problem), but I couldn't yet reach a
speedup of even 2.0.

The big villain in the expression is the 'S**1.5'. Removing that from
my Cython code gives a 12x speedup. Avoiding that exponentiation can
be probably beneficial to other techniques too...

I have recently done a code that uses the rsqrt SSE instruction for
approximate square roots. This is very useful... Wouldn't it be nice
to have that on scipy?

++nic



On Mon, Jan 10, 2011 at 05:52:33PM +0100, Francesc Alted wrote:

> A Monday 10 January 2011 16:47:44 [hidden email] escrigué:
> > > This looks terribly over complicated. What are you trying to do and
> > > where is the polynomial?
> >
> > I see, sorry, I'll try to put it more clearly. My computation is
> > something like this: for each time step, I compute the value of some
> > polynomial on all array elements, then I average the results to
> > obtain a single number:
> [clip]
>
> As others have said, Cython is good option.  Perhaps even easier would
> be Numexpr, that let's you get nice speed-ups on complex expressions,
> and as a plus, it lets you use any number of cores that your machine
> has.
>
> I've made a small demo (attached) on how Numexpr works based on your
> polynomial.  Here are my results of running it on a 6-core machine:
>
> Computing with NumPy with 10^5 elements:
> time spent: 0.0169
> Computing with Numexpr with 10^5 elements:
> time spent for 1 threads: 0.0052 speed-up: 3.22
> time spent for 2 threads: 0.0028 speed-up: 6.11
> time spent for 3 threads: 0.0022 speed-up: 7.82
> time spent for 4 threads: 0.0016 speed-up: 10.26
> time spent for 5 threads: 0.0014 speed-up: 12.31
> time spent for 6 threads: 0.0013 speed-up: 13.35
>
> Cheers,
>
> --
> Francesc Alted

> import math
> import numpy as np
> from numpy.testing import assert_array_almost_equal
> import numexpr as ne
> from time import time
>
> N = 1e5  # number of elements in arrays
> NTIMES = 100
>
> def _dens0(S,T, kernel):
>     """Density of seawater at zero pressure"""
>
>     # --- Define constants ---
>     a0 = 999.842594
>     a1 =   6.793952e-2
>     a2 =  -9.095290e-3
>     a3 =   1.001685e-4
>     a4 =  -1.120083e-6
>     a5 =   6.536332e-9
>
>     b0 =   8.24493e-1
>     b1 =  -4.0899e-3
>     b2 =   7.6438e-5
>     b3 =  -8.2467e-7
>     b4 =   5.3875e-9
>
>     c0 =  -5.72466e-3
>     c1 =   1.0227e-4
>     c2 =  -1.6546e-6
>
>     d0 =   4.8314e-4
>
>     # --- Computations ---
>     # Density of pure water
>     if kernel == "numpy":
>         SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T
>         # More temperature polynomials
>         RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T
>         RC = c0 + (c1 + c2*T)*T
>         result = SMOW + RB*S + RC*(S**1.5) + d0*S*S
>     elif kernel == "numexpr":
>         SMOW = "a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T"
>         # More temperature polynomials
>         RB = "b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T"
>         RC = "c0 + (c1 + c2*T)*T"
>         poly = "(%s) + (%s)*S + (%s)*(S**1.5) + (d0*S*S)" %  (SMOW, RB, RC)
>         result = ne.evaluate(poly)
>     else:
>         raise ValueError, "Unsupported kernel"
>
>     return result
>
> S = np.random.rand(N)
> R = np.random.rand(N)
>
> print "Computing with NumPy:"
> t0 = time()
> for i in range(NTIMES):
>     r1 = _dens0(S, R, "numpy")
> tnp = (time()-t0) / NTIMES
> print "time spent: %.4f" % tnp
>
> print "Computing with Numexpr with 10^%d elements:" % int(math.log10(N))
> for nthreads in range(1, ne.ncores+1):
>     ne.set_num_threads(nthreads)
>     t0 = time()
>     for i in range(NTIMES):
>         r2 = _dens0(S, R, "numexpr")
>     tne = (time()-t0) / NTIMES
>     print "time spent for %d threads: %.4f" % (nthreads, tne),
>     print "speed-up: %.2f" % (tnp / tne)
>
> assert_array_almost_equal(r1, r2, 15, "Arrays are not equal")

> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user


--
Nicolau Werneck <[hidden email]>          C3CF E29F 5350 5DAA 3705
http://www.lti.pcs.usp.br/~nwerneck           7B9E D6C4 37BB DA64 6F15
Linux user #460716
"C++ is history repeated as tragedy. Java is history repeated as farce."
-- Scott McKay

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Charles R Harris


On Mon, Jan 10, 2011 at 10:22 AM, Nicolau Werneck <[hidden email]> wrote:
That is one excellent suggestion, using multiple cores is probably the
best thing to do. I have tried a basic Cython implementation (because
it's such a basic and interesting problem), but I couldn't yet reach a
speedup of even 2.0.

The big villain in the expression is the 'S**1.5'. Removing that from
my Cython code gives a 12x speedup. Avoiding that exponentiation can
be probably beneficial to other techniques too...


The square root itself should be fast so I suspect that a log is being used. What happens if you use S*sqrt(S) instead?

<snip>

Chuck


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Nicolau Werneck
Great suggestion. I have tried just modifying the original numpy
expression replacing the S**1.5 for S*sqrt(S), and just by doing that
I already got a 2x speedup.

In the Cython version, using S*sqrt(S) gives a 7.3 speedup. Much
better than using exponentiation. Using the approximate rsqrt will
probably bring that closer to 10x.

++nic



On Mon, Jan 10, 2011 at 10:30:57AM -0700, Charles R Harris wrote:

>    On Mon, Jan 10, 2011 at 10:22 AM, Nicolau Werneck <[hidden email]>
>    wrote:
>
>      That is one excellent suggestion, using multiple cores is probably the
>      best thing to do. I have tried a basic Cython implementation (because
>      it's such a basic and interesting problem), but I couldn't yet reach a
>      speedup of even 2.0.
>
>      The big villain in the expression is the 'S**1.5'. Removing that from
>      my Cython code gives a 12x speedup. Avoiding that exponentiation can
>      be probably beneficial to other techniques too...
>
>    The square root itself should be fast so I suspect that a log is being
>    used. What happens if you use S*sqrt(S) instead?
>
>    <snip>
>
>    Chuck

> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user


--
Nicolau Werneck <[hidden email]>          C3CF E29F 5350 5DAA 3705
http://www.lti.pcs.usp.br/~nwerneck           7B9E D6C4 37BB DA64 6F15
Linux user #460716
"I wanted to change the world. But I have found that the only thing one can be sure of changing is oneself."
-- Aldous Huxley

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Francesc Alted-2
In reply to this post by Charles R Harris
A Monday 10 January 2011 18:30:57 Charles R Harris escrigué:
> On Mon, Jan 10, 2011 at 10:22 AM, Nicolau Werneck
<[hidden email]>wrote:

> > That is one excellent suggestion, using multiple cores is probably
> > the best thing to do. I have tried a basic Cython implementation
> > (because it's such a basic and interesting problem), but I
> > couldn't yet reach a speedup of even 2.0.
> >
> > The big villain in the expression is the 'S**1.5'. Removing that
> > from my Cython code gives a 12x speedup. Avoiding that
> > exponentiation can be probably beneficial to other techniques
> > too...
>
> The square root itself should be fast so I suspect that a log is
> being used. What happens if you use S*sqrt(S) instead?

For the Numexpr case, this does not change anything as this optimization
is already applied by the internal optimizer (to be strict, this is
applied only if the `optimization` flag in `evaluate()` is set to
"aggressive", but this is the default anyway).

--
Francesc Alted
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Chris Barker - NOAA Federal
In reply to this post by Nicolau Werneck
On 1/10/2011 8:14 AM, Nicolau Werneck wrote:
> If you really need to speed that up, you just have to implement this
> _dens0 function using weave, Cython or something like that. I can give
> you some more info about Cython, write me at my address if you want.

Tee way the original is written uses a fair number of temporaries, I'll
bet it could be sped up with pure numpy is that was addressed --
probably not as fast as Cython or numexpr, but maybe easier.

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Speed-up simple function

Francesc Alted-2
In reply to this post by Nicolau Werneck
A Monday 10 January 2011 18:57:18 Nicolau Werneck escrigué:
> Great suggestion. I have tried just modifying the original numpy
> expression replacing the S**1.5 for S*sqrt(S), and just by doing that
> I already got a 2x speedup.
>
> In the Cython version, using S*sqrt(S) gives a 7.3 speedup. Much
> better than using exponentiation. Using the approximate rsqrt will
> probably bring that closer to 10x.

Mmh, for double precision you cannot expect big speed-ups (at least,
until the new AVX instruction set is broadly available).  Here it is an
estimation on the speed-up you can get by using Numexpr+VML, which uses
SSEx (SSE4 for my case):

>>> x = np.linspace(0,1,1e6)
>>> timeit np.sqrt(x)
100 loops, best of 3: 6.69 ms per loop
>>> timeit ne.evaluate("sqrt(x)")
100 loops, best of 3: 4.37 ms per loop  # only 1.5x speed-up

With simple precision things are different:

>>> x = np.linspace(0,1,1e6).astype('f4')
>>> timeit np.sqrt(x)
100 loops, best of 3: 4.61 ms per loop
>>> timeit ne.evaluate("sqrt(x)")
1000 loops, best of 3: 1.83 ms per loop  # 2.5x speed-up

In my opinion, as newer processors will wear more cores into them,
multithreading will become a simpler (and cheaper) option for
accelerating this sort of computations (as well as computations limited
by memory bandwidth).  SIMD could help in getting more speed, of course,
but as I see it, it is multithreading that will be key for computational
problems in the next future (present?).

--
Francesc Alted
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user