Hi everybody, I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions? Thanks, Andrea _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
I would suggest you to write a small piece of code using either scipy
weave or Cython. I have been using Cython a lot, and it really pays off. But if you only have a simple expression such as a polynomial to calculate, and if something simpler such as weave works for you, it might be the best choice. ++nic On Mon, Jan 10, 2011 at 02:59:28PM +0100, [hidden email] wrote: > > Hi everybody, > > I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions? > > Thanks, > Andrea > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user -- Nicolau Werneck <[hidden email]> C3CF E29F 5350 5DAA 3705 http://www.lti.pcs.usp.br/~nwerneck 7B9E D6C4 37BB DA64 6F15 Linux user #460716 "I never claimed infallibility. You have to leave that to experts in that field like the Pope." -- Edsger Dijkstra _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Giovanni Plantageneto
Andrea,
Here is an example of scipy.weave that I generally use as a starting point: from scipy.weave import inline from scipy.weave.converters import blitz from pylab import * def foo(arr, c): ret = arr.copy() cvars = ['ret', 'c'] # Using blitz converter gives the following for an array arr: # Narr[0] is length of first dimension # Narr[1] is length of second dimension # arr(a, b) is the element arr[a][b] code = """ int i = Nret[0]; while(i--) { ret(i) += c; } """ inline(code, cvars, type_converters = blitz) return ret a = arange(5) foo(a, 1) print a I generally create all my python objects outside the code so I don't screw up any reference counting. One caveat is that you need to run the code in a single process the first time around or you can end up with some strange errors. David On 01/10/2011 07:59 AM, [hidden email] wrote: > Hi everybody, > > I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions? > > Thanks, > Andrea > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Giovanni Plantageneto
On Mon, Jan 10, 2011 at 6:59 AM, <[hidden email]> wrote:
Can you be more specific? If it's just one polynomial you can compute is all in one go. If it is a different polynomial for each array element you probably need to use Cython or some such. And what do you mean by "long" polynomial? Chuck _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Giovanni Plantageneto
On 01/10/2011 02:59 PM, [hidden email] wrote:
> Hi everybody, > > I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions? > Hi, Could you give us some more details? If the issue is the large number of polynomial evaluations I believe that it is already fast in numpy. Eg., 10^5 evaluations of a pretty long polynomial (100 terms): In [18]: %timeit np.polyval(np.linspace(-1,1,100),np.linspace(0,5,100000)) 1 loops, best of 3: 360 ms per loop Best, Emanuele _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Thanks for the quick answers. To be more specific, the slow part is something like this (see comments in the code): for tstep in xrange(0,end): # I need this cycle otherwise I run out of memory VAR1 = file.variables['VAR1'][tstep,:,:,:] #Read from a NetCDF file VAR2 = file.variables['VAR2'][tstep,:,:,:] #Read another variable from the same NetCDF file # Use the data read above in two functions. I do also some reshaping to have the correct # input for the function. The polynomials computed in the functions are of order 10 at most. COMP1 = FUNC1(VAR1,VAR2,np.tile(5000,VAR1.shape),np.reshape(np.repeat(SOMETHING,nrep),VAR1.shape)) COMP2 = FUNC2(VAR1,COMP1,np.tile(5000,VAR1.shape)) #Compute an average on the output of previous computations RESULT[tstep] = np.average(COMP2,weights=W) Thanks for any suggestion, Andrea ----- Start Original Message ----- Sent: Mon, 10 Jan 2011 15:33:13 +0100 From: Emanuele Olivetti <[hidden email]> To: [hidden email], SciPy Users List <[hidden email]> Subject: Re: [SciPy-User] Speed-up simple function > On 01/10/2011 02:59 PM, [hidden email] wrote: > > Hi everybody, > > > > I have some functions in python that perform simple computations (they compute the values of some long polynomials). Since I apply them to rather large arrays (10^5 elements) these functions slow down the script quite a bit. Is there a quick and simple way to speed up these functions? > > > > Hi, > > Could you give us some more details? If the issue is the large number of polynomial > evaluations I believe that it is already fast in numpy. Eg., 10^5 evaluations > of a pretty long polynomial (100 terms): > > In [18]: %timeit np.polyval(np.linspace(-1,1,100),np.linspace(0,5,100000)) > 1 loops, best of 3: 360 ms per loop > > Best, > > Emanuele ----- End Original Message ----- _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Mon, Jan 10, 2011 at 7:45 AM, <[hidden email]> wrote:
This looks terribly over complicated. What are you trying to do and where is the polynomial? <snip> Chuck _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
> > > This looks terribly over complicated. What are you trying to do and where is > the polynomial? > I see, sorry, I'll try to put it more clearly. My computation is something like this: for each time step, I compute the value of some polynomial on all array elements, then I average the results to obtain a single number: ############################################### for tstep in xrange(0,end): VAR1 = Read from some NetCDF file (~10^5 elements) VAR2 = Read from some NetCDF file (~10^5 elements) COMP1 = FUNC1(VAR1,VAR2) COMP2 = FUNC2(VAR1,COMP1) RESULT[tstep] = np.average(COMP2,weights=W) ############################################### I checked and the bottleneck is really in the computations done by FUNC1 and FUNC2. The polynomials are in the functions FUNC1 and FUNC2 (from python seawater library, a library that provides some standard functions for seawater properties). As an example (they are all rather similar, just polynomials), one of the functions is: ########################################################### def _dens0(S,T): """Density of seawater at zero pressure""" # --- Define constants --- a0 = 999.842594 a1 = 6.793952e-2 a2 = -9.095290e-3 a3 = 1.001685e-4 a4 = -1.120083e-6 a5 = 6.536332e-9 b0 = 8.24493e-1 b1 = -4.0899e-3 b2 = 7.6438e-5 b3 = -8.2467e-7 b4 = 5.3875e-9 c0 = -5.72466e-3 c1 = 1.0227e-4 c2 = -1.6546e-6 d0 = 4.8314e-4 # --- Computations --- # Density of pure water SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T # More temperature polynomials RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T RC = c0 + (c1 + c2*T)*T return SMOW + RB*S + RC*(S**1.5) + d0*S*S #################################################### _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Hi. I tried this function in my Atom N450 computer, with two vectors
with 100.000 numbers, and it took only 90ms. Do you really need it faster than this? Do you have to make these calculations to a lot of different 100k long vectors, or something like that?... If you really need to speed that up, you just have to implement this _dens0 function using weave, Cython or something like that. I can give you some more info about Cython, write me at my address if you want. ++nic On Mon, Jan 10, 2011 at 04:47:44PM +0100, [hidden email] wrote: > > > > > > This looks terribly over complicated. What are you trying to do and where is > > the polynomial? > > > > I see, sorry, I'll try to put it more clearly. My computation is something like this: for each time step, I compute the value of some polynomial on all array elements, then I average the results to obtain a single number: > > ############################################### > > for tstep in xrange(0,end): > VAR1 = Read from some NetCDF file (~10^5 elements) > VAR2 = Read from some NetCDF file (~10^5 elements) > COMP1 = FUNC1(VAR1,VAR2) > COMP2 = FUNC2(VAR1,COMP1) > RESULT[tstep] = np.average(COMP2,weights=W) > > ############################################### > > I checked and the bottleneck is really in the computations done by FUNC1 and FUNC2. > The polynomials are in the functions FUNC1 and FUNC2 (from python seawater library, a library that provides some standard functions for seawater properties). As an example (they are all rather similar, just polynomials), one of the functions is: > > ########################################################### > > def _dens0(S,T): > """Density of seawater at zero pressure""" > > # --- Define constants --- > a0 = 999.842594 > a1 = 6.793952e-2 > a2 = -9.095290e-3 > a3 = 1.001685e-4 > a4 = -1.120083e-6 > a5 = 6.536332e-9 > > b0 = 8.24493e-1 > b1 = -4.0899e-3 > b2 = 7.6438e-5 > b3 = -8.2467e-7 > b4 = 5.3875e-9 > > c0 = -5.72466e-3 > c1 = 1.0227e-4 > c2 = -1.6546e-6 > > d0 = 4.8314e-4 > > # --- Computations --- > # Density of pure water > SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T > > # More temperature polynomials > RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T > RC = c0 + (c1 + c2*T)*T > > return SMOW + RB*S + RC*(S**1.5) + d0*S*S > > #################################################### > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user -- Nicolau Werneck <[hidden email]> C3CF E29F 5350 5DAA 3705 http://www.lti.pcs.usp.br/~nwerneck 7B9E D6C4 37BB DA64 6F15 Linux user #460716 "A language that doesn't affect the way you think about programming, is not worth knowing." -- Alan J. Perlis _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Giovanni Plantageneto
A Monday 10 January 2011 16:47:44 [hidden email] escrigué:
> > This looks terribly over complicated. What are you trying to do and > > where is the polynomial? > > I see, sorry, I'll try to put it more clearly. My computation is > something like this: for each time step, I compute the value of some > polynomial on all array elements, then I average the results to > obtain a single number: [clip] As others have said, Cython is good option. Perhaps even easier would be Numexpr, that let's you get nice speed-ups on complex expressions, and as a plus, it lets you use any number of cores that your machine has. I've made a small demo (attached) on how Numexpr works based on your polynomial. Here are my results of running it on a 6-core machine: Computing with NumPy with 10^5 elements: time spent: 0.0169 Computing with Numexpr with 10^5 elements: time spent for 1 threads: 0.0052 speed-up: 3.22 time spent for 2 threads: 0.0028 speed-up: 6.11 time spent for 3 threads: 0.0022 speed-up: 7.82 time spent for 4 threads: 0.0016 speed-up: 10.26 time spent for 5 threads: 0.0014 speed-up: 12.31 time spent for 6 threads: 0.0013 speed-up: 13.35 Cheers, -- Francesc Alted _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user test.py (1K) Download Attachment |
That is one excellent suggestion, using multiple cores is probably the
best thing to do. I have tried a basic Cython implementation (because it's such a basic and interesting problem), but I couldn't yet reach a speedup of even 2.0. The big villain in the expression is the 'S**1.5'. Removing that from my Cython code gives a 12x speedup. Avoiding that exponentiation can be probably beneficial to other techniques too... I have recently done a code that uses the rsqrt SSE instruction for approximate square roots. This is very useful... Wouldn't it be nice to have that on scipy? ++nic On Mon, Jan 10, 2011 at 05:52:33PM +0100, Francesc Alted wrote: > A Monday 10 January 2011 16:47:44 [hidden email] escrigué: > > > This looks terribly over complicated. What are you trying to do and > > > where is the polynomial? > > > > I see, sorry, I'll try to put it more clearly. My computation is > > something like this: for each time step, I compute the value of some > > polynomial on all array elements, then I average the results to > > obtain a single number: > [clip] > > As others have said, Cython is good option. Perhaps even easier would > be Numexpr, that let's you get nice speed-ups on complex expressions, > and as a plus, it lets you use any number of cores that your machine > has. > > I've made a small demo (attached) on how Numexpr works based on your > polynomial. Here are my results of running it on a 6-core machine: > > Computing with NumPy with 10^5 elements: > time spent: 0.0169 > Computing with Numexpr with 10^5 elements: > time spent for 1 threads: 0.0052 speed-up: 3.22 > time spent for 2 threads: 0.0028 speed-up: 6.11 > time spent for 3 threads: 0.0022 speed-up: 7.82 > time spent for 4 threads: 0.0016 speed-up: 10.26 > time spent for 5 threads: 0.0014 speed-up: 12.31 > time spent for 6 threads: 0.0013 speed-up: 13.35 > > Cheers, > > -- > Francesc Alted > import math > import numpy as np > from numpy.testing import assert_array_almost_equal > import numexpr as ne > from time import time > > N = 1e5 # number of elements in arrays > NTIMES = 100 > > def _dens0(S,T, kernel): > """Density of seawater at zero pressure""" > > # --- Define constants --- > a0 = 999.842594 > a1 = 6.793952e-2 > a2 = -9.095290e-3 > a3 = 1.001685e-4 > a4 = -1.120083e-6 > a5 = 6.536332e-9 > > b0 = 8.24493e-1 > b1 = -4.0899e-3 > b2 = 7.6438e-5 > b3 = -8.2467e-7 > b4 = 5.3875e-9 > > c0 = -5.72466e-3 > c1 = 1.0227e-4 > c2 = -1.6546e-6 > > d0 = 4.8314e-4 > > # --- Computations --- > # Density of pure water > if kernel == "numpy": > SMOW = a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T > # More temperature polynomials > RB = b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T > RC = c0 + (c1 + c2*T)*T > result = SMOW + RB*S + RC*(S**1.5) + d0*S*S > elif kernel == "numexpr": > SMOW = "a0 + (a1 + (a2 + (a3 + (a4 + a5*T)*T)*T)*T)*T" > # More temperature polynomials > RB = "b0 + (b1 + (b2 + (b3 + b4*T)*T)*T)*T" > RC = "c0 + (c1 + c2*T)*T" > poly = "(%s) + (%s)*S + (%s)*(S**1.5) + (d0*S*S)" % (SMOW, RB, RC) > result = ne.evaluate(poly) > else: > raise ValueError, "Unsupported kernel" > > return result > > S = np.random.rand(N) > R = np.random.rand(N) > > print "Computing with NumPy:" > t0 = time() > for i in range(NTIMES): > r1 = _dens0(S, R, "numpy") > tnp = (time()-t0) / NTIMES > print "time spent: %.4f" % tnp > > print "Computing with Numexpr with 10^%d elements:" % int(math.log10(N)) > for nthreads in range(1, ne.ncores+1): > ne.set_num_threads(nthreads) > t0 = time() > for i in range(NTIMES): > r2 = _dens0(S, R, "numexpr") > tne = (time()-t0) / NTIMES > print "time spent for %d threads: %.4f" % (nthreads, tne), > print "speed-up: %.2f" % (tnp / tne) > > assert_array_almost_equal(r1, r2, 15, "Arrays are not equal") > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user -- Nicolau Werneck <[hidden email]> C3CF E29F 5350 5DAA 3705 http://www.lti.pcs.usp.br/~nwerneck 7B9E D6C4 37BB DA64 6F15 Linux user #460716 "C++ is history repeated as tragedy. Java is history repeated as farce." -- Scott McKay _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Mon, Jan 10, 2011 at 10:22 AM, Nicolau Werneck <[hidden email]> wrote:
That is one excellent suggestion, using multiple cores is probably the The square root itself should be fast so I suspect that a log is being used. What happens if you use S*sqrt(S) instead? <snip> Chuck _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Great suggestion. I have tried just modifying the original numpy
expression replacing the S**1.5 for S*sqrt(S), and just by doing that I already got a 2x speedup. In the Cython version, using S*sqrt(S) gives a 7.3 speedup. Much better than using exponentiation. Using the approximate rsqrt will probably bring that closer to 10x. ++nic On Mon, Jan 10, 2011 at 10:30:57AM -0700, Charles R Harris wrote: > On Mon, Jan 10, 2011 at 10:22 AM, Nicolau Werneck <[hidden email]> > wrote: > > That is one excellent suggestion, using multiple cores is probably the > best thing to do. I have tried a basic Cython implementation (because > it's such a basic and interesting problem), but I couldn't yet reach a > speedup of even 2.0. > > The big villain in the expression is the 'S**1.5'. Removing that from > my Cython code gives a 12x speedup. Avoiding that exponentiation can > be probably beneficial to other techniques too... > > The square root itself should be fast so I suspect that a log is being > used. What happens if you use S*sqrt(S) instead? > > <snip> > > Chuck > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user -- Nicolau Werneck <[hidden email]> C3CF E29F 5350 5DAA 3705 http://www.lti.pcs.usp.br/~nwerneck 7B9E D6C4 37BB DA64 6F15 Linux user #460716 "I wanted to change the world. But I have found that the only thing one can be sure of changing is oneself." -- Aldous Huxley _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Charles R Harris
A Monday 10 January 2011 18:30:57 Charles R Harris escrigué:
> On Mon, Jan 10, 2011 at 10:22 AM, Nicolau Werneck <[hidden email]>wrote: > > That is one excellent suggestion, using multiple cores is probably > > the best thing to do. I have tried a basic Cython implementation > > (because it's such a basic and interesting problem), but I > > couldn't yet reach a speedup of even 2.0. > > > > The big villain in the expression is the 'S**1.5'. Removing that > > from my Cython code gives a 12x speedup. Avoiding that > > exponentiation can be probably beneficial to other techniques > > too... > > The square root itself should be fast so I suspect that a log is > being used. What happens if you use S*sqrt(S) instead? For the Numexpr case, this does not change anything as this optimization is already applied by the internal optimizer (to be strict, this is applied only if the `optimization` flag in `evaluate()` is set to "aggressive", but this is the default anyway). -- Francesc Alted _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Nicolau Werneck
On 1/10/2011 8:14 AM, Nicolau Werneck wrote:
> If you really need to speed that up, you just have to implement this > _dens0 function using weave, Cython or something like that. I can give > you some more info about Cython, write me at my address if you want. Tee way the original is written uses a fair number of temporaries, I'll bet it could be sped up with pure numpy is that was addressed -- probably not as fast as Cython or numexpr, but maybe easier. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [hidden email] _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Nicolau Werneck
A Monday 10 January 2011 18:57:18 Nicolau Werneck escrigué:
> Great suggestion. I have tried just modifying the original numpy > expression replacing the S**1.5 for S*sqrt(S), and just by doing that > I already got a 2x speedup. > > In the Cython version, using S*sqrt(S) gives a 7.3 speedup. Much > better than using exponentiation. Using the approximate rsqrt will > probably bring that closer to 10x. Mmh, for double precision you cannot expect big speed-ups (at least, until the new AVX instruction set is broadly available). Here it is an estimation on the speed-up you can get by using Numexpr+VML, which uses SSEx (SSE4 for my case): >>> x = np.linspace(0,1,1e6) >>> timeit np.sqrt(x) 100 loops, best of 3: 6.69 ms per loop >>> timeit ne.evaluate("sqrt(x)") 100 loops, best of 3: 4.37 ms per loop # only 1.5x speed-up With simple precision things are different: >>> x = np.linspace(0,1,1e6).astype('f4') >>> timeit np.sqrt(x) 100 loops, best of 3: 4.61 ms per loop >>> timeit ne.evaluate("sqrt(x)") 1000 loops, best of 3: 1.83 ms per loop # 2.5x speed-up In my opinion, as newer processors will wear more cores into them, multithreading will become a simpler (and cheaper) option for accelerating this sort of computations (as well as computations limited by memory bandwidth). SIMD could help in getting more speed, of course, but as I see it, it is multithreading that will be key for computational problems in the next future (present?). -- Francesc Alted _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |