help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
I need help speeding up some code I wrote to perform a Runge-Kuta
integration.  I need to do the integration as part of a real-time
control algorithm, so it needs to be fairly fast.
scipy.integrate.odeint does too much error checking to be fast enough.
 My pure Python version was just a little too slow, so I tried coding
it up in Cython.  I have only used Cython once before, so I don't know
if I did it correctly (the .pyx file is attached).

The code runs just fine, but there is almost no speed up.  I think the
core issue is that my dxdt_runge_kuta function gets called about 4000
times per second, so most of my overhead is in the function calls (I
think).  I am running my real-time control algorithm at 500 Hz and I
need at least 2 Runge-Kuta integration steps per real-time steps for
numeric stability.  And the Runge-Kuta algorithm needs to evaluate the
derivative 4 times per times step.  So, 500 Hz * 2 * 4 = 4000 calls
per second.

I also tried coding this up in fortran and using f2py, but I am
getting a type mismatch error I don't understand.  I have a function
that declares its return values as double precision:

double precision function dzdt(x,voltage)

and I declare the variable I want to store the returned value in to
also be double precision:

double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4

zdot1 = dzdt(x_prev,volts)

but some how it is not happy.


My C skills are pretty weak (the longer I use Python, the more C I
forget, and I didn't know that much to start with).  I started looking
into Boost as well as using f2py on C code, but I got stuck.


Can anyone either make my Cython or Fortran approaches work or point
me in a different direction?

Thanks,

Ryan

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

runge_kuta.pyx (3K) Download Attachment
runge_kuta_f.f (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Jim Vickroy
On 8/3/2012 11:02 AM, Ryan Krauss wrote:
I need help speeding up some code I wrote to perform a Runge-Kuta
integration.  I need to do the integration as part of a real-time
control algorithm, so it needs to be fairly fast.
scipy.integrate.odeint does too much error checking to be fast enough.
 My pure Python version was just a little too slow, so I tried coding
it up in Cython.  I have only used Cython once before, so I don't know
if I did it correctly (the .pyx file is attached).

The code runs just fine, but there is almost no speed up.  I think the
core issue is that my dxdt_runge_kuta function gets called about 4000
times per second, so most of my overhead is in the function calls (I
think).  I am running my real-time control algorithm at 500 Hz and I
need at least 2 Runge-Kuta integration steps per real-time steps for
numeric stability.  And the Runge-Kuta algorithm needs to evaluate the
derivative 4 times per times step.  So, 500 Hz * 2 * 4 = 4000 calls
per second.

I also tried coding this up in fortran and using f2py, but I am
getting a type mismatch error I don't understand.  I have a function
that declares its return values as double precision:

double precision function dzdt(x,voltage)

and I declare the variable I want to store the returned value in to
also be double precision:

double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4

zdot1 = dzdt(x_prev,volts)

but some how it is not happy.

I'm not much of a Fortran programmer and I may misunderstand the above, but have you tried adding dzdt to your double precision declaration?





My C skills are pretty weak (the longer I use Python, the more C I
forget, and I didn't know that much to start with).  I started looking
into Boost as well as using f2py on C code, but I got stuck.


Can anyone either make my Cython or Fortran approaches work or point
me in a different direction?

Thanks,

Ryan


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Pauli Virtanen-3
In reply to this post by Ryan Krauss-2
03.08.2012 19:02, Ryan Krauss kirjoitti:
[clip]
> Can anyone either make my Cython or Fortran approaches work or point
> me in a different direction?

Regarding Cython: run

        cython -a runge_kuta.pyx

and check the created HTML file. Slow points are highlighted with yellow.

Regarding this case:

- `cdef`, not `def` for the dxdt_* function

- from libc.math import exp

- Do not use small numpy arrays inside loops.
  Use C constructs instead.

- Use @cython.cdivision(True), @cython.boundscheck(False)



PS. Runge-Kutta

--
Pauli Virtanen

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Pauli Virtanen-3
In reply to this post by Ryan Krauss-2
03.08.2012 19:02, Ryan Krauss kirjoitti:
[clip]
> zdot1 = dzdt(x_prev,volts)
>
> but some how it is not happy.

It's Fortran 77. You need to declare

        double precision dzdt

I'd suggest writing Fortran 90 --- no need to bring more F77 code into
existence ;)

--
Pauli Virtanen

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
In reply to this post by Pauli Virtanen-3
Thanks for the suggestions.

> - Do not use small numpy arrays inside loops.
>   Use C constructs instead.

This is where I ran into trouble with my knowledge of C.  I have
several 3x1 arrays that I need to pass into the dxdt function,
multiply by scalars, and add together.  I don't know how to do that
cleanly in C.  For example:
x_out = x_prev + 1.0/6*(g1 + 2*g2 + 2*g3 + g4)
where x_prev, g1, g2, g3, and g4 are all 3x1.

A little googling lead me to valarray's, but I don't know if that is
the best approach or how to use them within Cython.

How would you do basic math on small arrays in pure C?



On Fri, Aug 3, 2012 at 1:56 PM, Pauli Virtanen <[hidden email]> wrote:

> 03.08.2012 19:02, Ryan Krauss kirjoitti:
> [clip]
>> Can anyone either make my Cython or Fortran approaches work or point
>> me in a different direction?
>
> Regarding Cython: run
>
>         cython -a runge_kuta.pyx
>
> and check the created HTML file. Slow points are highlighted with yellow.
>
> Regarding this case:
>
> - `cdef`, not `def` for the dxdt_* function
>
> - from libc.math import exp
>
> - Do not use small numpy arrays inside loops.
>   Use C constructs instead.
>
> - Use @cython.cdivision(True), @cython.boundscheck(False)
>
>
>
> PS. Runge-Kutta
>
> --
> Pauli Virtanen
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
Fortran, so fast, yet so painful.  Once I got it working, it was 94
times faster than my pure Python version.

Thanks to Jim and Pauli for helping me find my error.  Ironically, I
was thinking like a C programmer.  Just because a Fortran function
declares its return value data type doesn't mean all calling functions
or subroutines will know the data of the function when they call it.

I am still open to Cython suggestions.  I don't want to bring more F77
code into the world.....

On Fri, Aug 3, 2012 at 2:16 PM, Ryan Krauss <[hidden email]> wrote:

> Thanks for the suggestions.
>
>> - Do not use small numpy arrays inside loops.
>>   Use C constructs instead.
>
> This is where I ran into trouble with my knowledge of C.  I have
> several 3x1 arrays that I need to pass into the dxdt function,
> multiply by scalars, and add together.  I don't know how to do that
> cleanly in C.  For example:
> x_out = x_prev + 1.0/6*(g1 + 2*g2 + 2*g3 + g4)
> where x_prev, g1, g2, g3, and g4 are all 3x1.
>
> A little googling lead me to valarray's, but I don't know if that is
> the best approach or how to use them within Cython.
>
> How would you do basic math on small arrays in pure C?
>
>
>
> On Fri, Aug 3, 2012 at 1:56 PM, Pauli Virtanen <[hidden email]> wrote:
>> 03.08.2012 19:02, Ryan Krauss kirjoitti:
>> [clip]
>>> Can anyone either make my Cython or Fortran approaches work or point
>>> me in a different direction?
>>
>> Regarding Cython: run
>>
>>         cython -a runge_kuta.pyx
>>
>> and check the created HTML file. Slow points are highlighted with yellow.
>>
>> Regarding this case:
>>
>> - `cdef`, not `def` for the dxdt_* function
>>
>> - from libc.math import exp
>>
>> - Do not use small numpy arrays inside loops.
>>   Use C constructs instead.
>>
>> - Use @cython.cdivision(True), @cython.boundscheck(False)
>>
>>
>>
>> PS. Runge-Kutta
>>
>> --
>> Pauli Virtanen
>>
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Sturla Molden-2
In reply to this post by Pauli Virtanen-3

Den 03.08.2012 21:05, skrev Pauli Virtanen:
> It's Fortran 77. You need to declare
>
> double precision dzdt
>
> I'd suggest writing Fortran 90 --- no need to bring more F77 code into
> existence ;)
>

With the new typed memoryviews in Cython, there is no need to bring more
Fortran of any sort into existance. ;-)

Sturla


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Sebastian Berg
In reply to this post by Ryan Krauss-2
Hey,

Just to add what was said previously, isn't float in Cython single
precision? I doubt this was intended here and should be replaced with
DTYPE_t everywhere. Other then that it was already said, np.zeros/np.exp
is bad there...

Regards,

Sebastian


On Fr, 2012-08-03 at 14:41 -0500, Ryan Krauss wrote:

> Fortran, so fast, yet so painful.  Once I got it working, it was 94
> times faster than my pure Python version.
>
> Thanks to Jim and Pauli for helping me find my error.  Ironically, I
> was thinking like a C programmer.  Just because a Fortran function
> declares its return value data type doesn't mean all calling functions
> or subroutines will know the data of the function when they call it.
>
> I am still open to Cython suggestions.  I don't want to bring more F77
> code into the world.....
>
> On Fri, Aug 3, 2012 at 2:16 PM, Ryan Krauss <[hidden email]> wrote:
> > Thanks for the suggestions.
> >
> >> - Do not use small numpy arrays inside loops.
> >>   Use C constructs instead.
> >
> > This is where I ran into trouble with my knowledge of C.  I have
> > several 3x1 arrays that I need to pass into the dxdt function,
> > multiply by scalars, and add together.  I don't know how to do that
> > cleanly in C.  For example:
> > x_out = x_prev + 1.0/6*(g1 + 2*g2 + 2*g3 + g4)
> > where x_prev, g1, g2, g3, and g4 are all 3x1.
> >
> > A little googling lead me to valarray's, but I don't know if that is
> > the best approach or how to use them within Cython.
> >
> > How would you do basic math on small arrays in pure C?
> >
> >
> >
> > On Fri, Aug 3, 2012 at 1:56 PM, Pauli Virtanen <[hidden email]> wrote:
> >> 03.08.2012 19:02, Ryan Krauss kirjoitti:
> >> [clip]
> >>> Can anyone either make my Cython or Fortran approaches work or point
> >>> me in a different direction?
> >>
> >> Regarding Cython: run
> >>
> >>         cython -a runge_kuta.pyx
> >>
> >> and check the created HTML file. Slow points are highlighted with yellow.
> >>
> >> Regarding this case:
> >>
> >> - `cdef`, not `def` for the dxdt_* function
> >>
> >> - from libc.math import exp
> >>
> >> - Do not use small numpy arrays inside loops.
> >>   Use C constructs instead.
> >>
> >> - Use @cython.cdivision(True), @cython.boundscheck(False)
> >>
> >>
> >>
> >> PS. Runge-Kutta
> >>
> >> --
> >> Pauli Virtanen
> >>
> >> _______________________________________________
> >> SciPy-User mailing list
> >> [hidden email]
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Jonathan Stickel-5
In reply to this post by Ryan Krauss-2
I am sure properly coded Cython is great, but I really struggled when I
tried to use it. I found that it allows you to write really slow code
without errors or warnings. I found the profiling tools to be only
marginally helpful. So many different ways to do the same thing... which
is the best? All the documentation is nice, but very long and dense.

I am having much more success with f2py (using F90 syntax). Either your
code runs fast, or it simply will not compile (excepting segfault bugs
that can sometimes be difficult to track down). Improved and updated
documentation would be helpful, but otherwise f2py is now what I turn to
when speed is crucial.

My 2 cents. YMMV.

Jonathan


On 08/04/2012 02:45 AM, [hidden email] wrote:

> Date: Sat, 04 Aug 2012 03:03:38 +0200
> From: Sturla Molden
> Subject: Re: [SciPy-User] help speeding up a Runge-Kuta algorithm
> (cython, f2py, ...)
>
> Den 03.08.2012 21:05, skrev Pauli Virtanen:
>> >It's Fortran 77. You need to declare
>> >
>> > double precision dzdt
>> >
>> >I'd suggest writing Fortran 90 --- no need to bring more F77 code into
>> >existence;)
>> >
> With the new typed memoryviews in Cython, there is no need to bring more
> Fortran of any sort into existance.;-)
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Sturla Molden-2
Den 04.08.2012 21:35, skrev Jonathan Stickel:
> I am sure properly coded Cython is great, but I really struggled when I
> tried to use it.

I uses np.ndarray declarations and array expressions. Those will be
slow. To write fast and numpyonic Cython, use typed memoryviews instead,
and write out all loops. I.e. there is no support for array expressions
with these yet. And unfortunately there is a huge lack of documentation
on how to use typed memoryviews efficiently.

Here is an example code of how to use Cython as a Fortran killer:

https://github.com/sturlamolden/memview_benchmarks/blob/master/memview.pyx

In this case, the performance with -O2 was just 2.2% slower than "plain
C" with pointer arithmetics.

It is possible to write very fast array code with Cython, but you must
do it right.

For comparison, this is very slow:

https://github.com/sturlamolden/memview_benchmarks/blob/master/cythonized_numpy_2b.pyx

What this mean is this: For anything but trivial code, the NumPy syntax
is just too slow and should be avoided!

I breiefly looked at the Cython code posted in this thread, and it
suffers form all these issues.


Sturla









_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Sturla Molden-2
In reply to this post by Ryan Krauss-2
Not tested and debugged, but to me it looks like something like this might be what you want.

Sturla


Den 03.08.2012 19:02, skrev Ryan Krauss:
I need help speeding up some code I wrote to perform a Runge-Kuta
integration.  I need to do the integration as part of a real-time
control algorithm, so it needs to be fairly fast.
scipy.integrate.odeint does too much error checking to be fast enough.
 My pure Python version was just a little too slow, so I tried coding
it up in Cython.  I have only used Cython once before, so I don't know
if I did it correctly (the .pyx file is attached).

The code runs just fine, but there is almost no speed up.  I think the
core issue is that my dxdt_runge_kuta function gets called about 4000
times per second, so most of my overhead is in the function calls (I
think).  I am running my real-time control algorithm at 500 Hz and I
need at least 2 Runge-Kuta integration steps per real-time steps for
numeric stability.  And the Runge-Kuta algorithm needs to evaluate the
derivative 4 times per times step.  So, 500 Hz * 2 * 4 = 4000 calls
per second.

I also tried coding this up in fortran and using f2py, but I am
getting a type mismatch error I don't understand.  I have a function
that declares its return values as double precision:

double precision function dzdt(x,voltage)

and I declare the variable I want to store the returned value in to
also be double precision:

double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4

zdot1 = dzdt(x_prev,volts)

but some how it is not happy.


My C skills are pretty weak (the longer I use Python, the more C I
forget, and I didn't know that much to start with).  I started looking
into Boost as well as using f2py on C code, but I got stuck.


Can anyone either make my Cython or Fortran approaches work or point
me in a different direction?

Thanks,

Ryan


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

runge_kuta.pyx (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Gael Varoquaux
In reply to this post by Jonathan Stickel-5
On Sat, Aug 04, 2012 at 01:35:17PM -0600, Jonathan Stickel wrote:
> I am sure properly coded Cython is great, but I really struggled when I
> tried to use it. I found that it allows you to write really slow code
> without errors or warnings. I found the profiling tools to be only
> marginally helpful.

To write fast cython code, compile it with 'cython -a', open the
resulting html file in a web browser. The yellow lines are where the
problems are: click on them and you'll find that they correspond to lines
of Cython code that lead to long and complex C code. Improve your code
(by making sure that it relies on typed variables and fast array access)
until it has not yellow lines.

HTH,

Gael
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
In reply to this post by Sturla Molden-2
Thanks to Sturla for helping me get this working in Cython.

I am trying to compile the code to compare it against fortran for
speed.  I have run into two bugs so far (I mentioned that my C skills
are weak).

The first has to do with the "const trick":
Error compiling Cython file:
------------------------------------------------------------
...
cdef inline void dxdt_runge_kuta(double *x "const double *",
                                 double voltage "const double",
                                 double *dxdt):
    #cdef double J = 0.0011767297528720126 "const double"
    cdef double J = 0.0011767297528720126
    cdef double alpha0 = 4.1396263800000002 "const double"
                                                                 ^
------------------------------------------------------------

runge_kuta_v2.pyx:12:44: Syntax error in C variable declaration

I don't know what the problem is here, so for now I just got rid of
all the "const double" statements. (In case the formatting doesn't
come through, the little error carrot ^ points to the space between
the last number and the quote.

After getting rid of all the "const double" expressions (just to see
if everything else would compile), I got this:
Error compiling Cython file:
------------------------------------------------------------
...
    dxdt[0] = vel
    dxdt[1] = accel
    dxdt[2] = dzdt


def runge_kuta_one_step(double _x[::1], Py_ssize_t factor, double volts,
                                                  ^
------------------------------------------------------------

runge_kuta_v2.pyx:31:34: Expected an identifier or literal

The carrot points to the first square bracket.

Thanks,

Ryan


On Sat, Aug 4, 2012 at 6:28 PM, Sturla Molden <[hidden email]> wrote:

> Not tested and debugged, but to me it looks like something like this might
> be what you want.
>
> Sturla
>
>
> Den 03.08.2012 19:02, skrev Ryan Krauss:
>
> I need help speeding up some code I wrote to perform a Runge-Kuta
> integration.  I need to do the integration as part of a real-time
> control algorithm, so it needs to be fairly fast.
> scipy.integrate.odeint does too much error checking to be fast enough.
>  My pure Python version was just a little too slow, so I tried coding
> it up in Cython.  I have only used Cython once before, so I don't know
> if I did it correctly (the .pyx file is attached).
>
> The code runs just fine, but there is almost no speed up.  I think the
> core issue is that my dxdt_runge_kuta function gets called about 4000
> times per second, so most of my overhead is in the function calls (I
> think).  I am running my real-time control algorithm at 500 Hz and I
> need at least 2 Runge-Kuta integration steps per real-time steps for
> numeric stability.  And the Runge-Kuta algorithm needs to evaluate the
> derivative 4 times per times step.  So, 500 Hz * 2 * 4 = 4000 calls
> per second.
>
> I also tried coding this up in fortran and using f2py, but I am
> getting a type mismatch error I don't understand.  I have a function
> that declares its return values as double precision:
>
> double precision function dzdt(x,voltage)
>
> and I declare the variable I want to store the returned value in to
> also be double precision:
>
> double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
>
> zdot1 = dzdt(x_prev,volts)
>
> but some how it is not happy.
>
>
> My C skills are pretty weak (the longer I use Python, the more C I
> forget, and I didn't know that much to start with).  I started looking
> into Boost as well as using f2py on C code, but I got stuck.
>
>
> Can anyone either make my Cython or Fortran approaches work or point
> me in a different direction?
>
> Thanks,
>
> Ryan
>
>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
So, I get the same error when I try to compile Stula's memview.pyx
example.  I think I have too old of a version of cython:

Cython version 0.15.1

Let me look into that...

On Mon, Aug 6, 2012 at 8:51 AM, Ryan Krauss <[hidden email]> wrote:

> Thanks to Sturla for helping me get this working in Cython.
>
> I am trying to compile the code to compare it against fortran for
> speed.  I have run into two bugs so far (I mentioned that my C skills
> are weak).
>
> The first has to do with the "const trick":
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
> cdef inline void dxdt_runge_kuta(double *x "const double *",
>                                  double voltage "const double",
>                                  double *dxdt):
>     #cdef double J = 0.0011767297528720126 "const double"
>     cdef double J = 0.0011767297528720126
>     cdef double alpha0 = 4.1396263800000002 "const double"
>                                                                  ^
> ------------------------------------------------------------
>
> runge_kuta_v2.pyx:12:44: Syntax error in C variable declaration
>
> I don't know what the problem is here, so for now I just got rid of
> all the "const double" statements. (In case the formatting doesn't
> come through, the little error carrot ^ points to the space between
> the last number and the quote.
>
> After getting rid of all the "const double" expressions (just to see
> if everything else would compile), I got this:
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
>     dxdt[0] = vel
>     dxdt[1] = accel
>     dxdt[2] = dzdt
>
>
> def runge_kuta_one_step(double _x[::1], Py_ssize_t factor, double volts,
>                                                   ^
> ------------------------------------------------------------
>
> runge_kuta_v2.pyx:31:34: Expected an identifier or literal
>
> The carrot points to the first square bracket.
>
> Thanks,
>
> Ryan
>
>
> On Sat, Aug 4, 2012 at 6:28 PM, Sturla Molden <[hidden email]> wrote:
>> Not tested and debugged, but to me it looks like something like this might
>> be what you want.
>>
>> Sturla
>>
>>
>> Den 03.08.2012 19:02, skrev Ryan Krauss:
>>
>> I need help speeding up some code I wrote to perform a Runge-Kuta
>> integration.  I need to do the integration as part of a real-time
>> control algorithm, so it needs to be fairly fast.
>> scipy.integrate.odeint does too much error checking to be fast enough.
>>  My pure Python version was just a little too slow, so I tried coding
>> it up in Cython.  I have only used Cython once before, so I don't know
>> if I did it correctly (the .pyx file is attached).
>>
>> The code runs just fine, but there is almost no speed up.  I think the
>> core issue is that my dxdt_runge_kuta function gets called about 4000
>> times per second, so most of my overhead is in the function calls (I
>> think).  I am running my real-time control algorithm at 500 Hz and I
>> need at least 2 Runge-Kuta integration steps per real-time steps for
>> numeric stability.  And the Runge-Kuta algorithm needs to evaluate the
>> derivative 4 times per times step.  So, 500 Hz * 2 * 4 = 4000 calls
>> per second.
>>
>> I also tried coding this up in fortran and using f2py, but I am
>> getting a type mismatch error I don't understand.  I have a function
>> that declares its return values as double precision:
>>
>> double precision function dzdt(x,voltage)
>>
>> and I declare the variable I want to store the returned value in to
>> also be double precision:
>>
>> double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
>>
>> zdot1 = dzdt(x_prev,volts)
>>
>> but some how it is not happy.
>>
>>
>> My C skills are pretty weak (the longer I use Python, the more C I
>> forget, and I didn't know that much to start with).  I started looking
>> into Boost as well as using f2py on C code, but I got stuck.
>>
>>
>> Can anyone either make my Cython or Fortran approaches work or point
>> me in a different direction?
>>
>> Thanks,
>>
>> Ryan
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
I upgraded to cython 0.16 and made a bit more progress.

I don't know if this is headed in the right direction or not, but
based on the memview.pyx example I changed
double _x[::1]
to
np.float64_t[::1] _x
and did the same thing with
cdef double out[::1] = np.zeros(3)

I seem to be closer to compiling successfully, but now have this error:
Error compiling Cython file:
------------------------------------------------------------
...

import numpy as np
cimport numpy as np
from libc.math cimport exp, fabs

cdef inline void dxdt_runge_kuta(double *x "const double *",
                               ^
------------------------------------------------------------

runge_kuta_v2.pyx:8:32: Function argument cannot have C name specification
(carrot points to the last a in runge_kuta

Thanks again,

Ryan


On Mon, Aug 6, 2012 at 9:02 AM, Ryan Krauss <[hidden email]> wrote:

> So, I get the same error when I try to compile Stula's memview.pyx
> example.  I think I have too old of a version of cython:
>
> Cython version 0.15.1
>
> Let me look into that...
>
> On Mon, Aug 6, 2012 at 8:51 AM, Ryan Krauss <[hidden email]> wrote:
>> Thanks to Sturla for helping me get this working in Cython.
>>
>> I am trying to compile the code to compare it against fortran for
>> speed.  I have run into two bugs so far (I mentioned that my C skills
>> are weak).
>>
>> The first has to do with the "const trick":
>> Error compiling Cython file:
>> ------------------------------------------------------------
>> ...
>> cdef inline void dxdt_runge_kuta(double *x "const double *",
>>                                  double voltage "const double",
>>                                  double *dxdt):
>>     #cdef double J = 0.0011767297528720126 "const double"
>>     cdef double J = 0.0011767297528720126
>>     cdef double alpha0 = 4.1396263800000002 "const double"
>>                                                                  ^
>> ------------------------------------------------------------
>>
>> runge_kuta_v2.pyx:12:44: Syntax error in C variable declaration
>>
>> I don't know what the problem is here, so for now I just got rid of
>> all the "const double" statements. (In case the formatting doesn't
>> come through, the little error carrot ^ points to the space between
>> the last number and the quote.
>>
>> After getting rid of all the "const double" expressions (just to see
>> if everything else would compile), I got this:
>> Error compiling Cython file:
>> ------------------------------------------------------------
>> ...
>>     dxdt[0] = vel
>>     dxdt[1] = accel
>>     dxdt[2] = dzdt
>>
>>
>> def runge_kuta_one_step(double _x[::1], Py_ssize_t factor, double volts,
>>                                                   ^
>> ------------------------------------------------------------
>>
>> runge_kuta_v2.pyx:31:34: Expected an identifier or literal
>>
>> The carrot points to the first square bracket.
>>
>> Thanks,
>>
>> Ryan
>>
>>
>> On Sat, Aug 4, 2012 at 6:28 PM, Sturla Molden <[hidden email]> wrote:
>>> Not tested and debugged, but to me it looks like something like this might
>>> be what you want.
>>>
>>> Sturla
>>>
>>>
>>> Den 03.08.2012 19:02, skrev Ryan Krauss:
>>>
>>> I need help speeding up some code I wrote to perform a Runge-Kuta
>>> integration.  I need to do the integration as part of a real-time
>>> control algorithm, so it needs to be fairly fast.
>>> scipy.integrate.odeint does too much error checking to be fast enough.
>>>  My pure Python version was just a little too slow, so I tried coding
>>> it up in Cython.  I have only used Cython once before, so I don't know
>>> if I did it correctly (the .pyx file is attached).
>>>
>>> The code runs just fine, but there is almost no speed up.  I think the
>>> core issue is that my dxdt_runge_kuta function gets called about 4000
>>> times per second, so most of my overhead is in the function calls (I
>>> think).  I am running my real-time control algorithm at 500 Hz and I
>>> need at least 2 Runge-Kuta integration steps per real-time steps for
>>> numeric stability.  And the Runge-Kuta algorithm needs to evaluate the
>>> derivative 4 times per times step.  So, 500 Hz * 2 * 4 = 4000 calls
>>> per second.
>>>
>>> I also tried coding this up in fortran and using f2py, but I am
>>> getting a type mismatch error I don't understand.  I have a function
>>> that declares its return values as double precision:
>>>
>>> double precision function dzdt(x,voltage)
>>>
>>> and I declare the variable I want to store the returned value in to
>>> also be double precision:
>>>
>>> double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
>>>
>>> zdot1 = dzdt(x_prev,volts)
>>>
>>> but some how it is not happy.
>>>
>>>
>>> My C skills are pretty weak (the longer I use Python, the more C I
>>> forget, and I didn't know that much to start with).  I started looking
>>> into Boost as well as using f2py on C code, but I got stuck.
>>>
>>>
>>> Can anyone either make my Cython or Fortran approaches work or point
>>> me in a different direction?
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Sturla Molden-2
In reply to this post by Ryan Krauss-2
Den 06.08.2012 15:51, skrev Ryan Krauss:
> Thanks to Sturla for helping me get this working in Cython.
>
> I am trying to compile the code to compare it against fortran for
> speed.  I have run into two bugs so far (I mentioned that my C skills
> are weak).
>

Sorry, I should have debugged :(

This one compiles with

    $ python setup.py build_ext

Is this what you wanted?



Sturla


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

runge_kutta.zip (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
Thanks Stula.  That code compiles just fine and will go a long way
toward helping me understand how to use Cython to write fast code for
these kinds of applications.

For many Runge-Kutta steps, your Cython code is 200 times faster than
my pure Python version.  Fortran is still 1.6 times faster than the
Cython version, but the Fortran version is much more work to code up.

Thanks again,

Ryan

On Mon, Aug 6, 2012 at 6:18 PM, Sturla Molden <[hidden email]> wrote:

> Den 06.08.2012 15:51, skrev Ryan Krauss:
>
>> Thanks to Sturla for helping me get this working in Cython.
>>
>> I am trying to compile the code to compare it against fortran for
>> speed.  I have run into two bugs so far (I mentioned that my C skills
>> are weak).
>>
>
> Sorry, I should have debugged :(
>
> This one compiles with
>
>    $ python setup.py build_ext
>
> Is this what you wanted?
>
>
>
> Sturla
>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Sturla Molden-2
On 07.08.2012 18:37, Ryan Krauss wrote:

> For many Runge-Kutta steps, your Cython code is 200 times faster than
> my pure Python version.  Fortran is still 1.6 times faster than the
> Cython version, but the Fortran version is much more work to code up.

Don't expect anything to be "faster than Fortran" for certain kind of
numerical work. Cython has a certain overhead (larger than C and
Fortran), and since it compiles to ANSI C (not ISO C) we cannot restrict
pointers. But still, ~75% of Fortran performance is often acceptable!
Another thing is you need to look at "scalability". How much of that
extra runtime is constant due to differences between Cython and f2py?
How much is variable due to the numerical kernel being faster in
Fortran? Will differently sized problems give you the same overhead from
using Cython? It often helps to plot a graph of the performance (mean
and error bars) for various problem sizes, rather than benchmarking at
one single point.

Correctness is always more important than speed. That is one thing to
consider too. With Cython we can begin with a tested Python prototype
and optimize along the way, using the Python profiler to pinpoint where
it matters the most. Python, NumPy and Cython will not win the world
championship of being "fastest on the CPU" for simple numerical kernels,
but that is not the idea either. Implementing complex algorithms in
Fortran can be a PITA compared to Python. But Cython helps us in a
stright forward way to speed up Python code and/or interface with C or
C++. Fortran is only nice for helping us scientists to avoid the pointer
arithmetics of C, but Cython's memoryviews do that too.


Sturla
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: help speeding up a Runge-Kuta algorithm (cython, f2py, ...)

Ryan Krauss-2
I agree.  Thanks again.

On Tue, Aug 7, 2012 at 1:10 PM, Sturla Molden <[hidden email]> wrote:

> On 07.08.2012 18:37, Ryan Krauss wrote:
>
>> For many Runge-Kutta steps, your Cython code is 200 times faster than
>> my pure Python version.  Fortran is still 1.6 times faster than the
>> Cython version, but the Fortran version is much more work to code up.
>
> Don't expect anything to be "faster than Fortran" for certain kind of
> numerical work. Cython has a certain overhead (larger than C and
> Fortran), and since it compiles to ANSI C (not ISO C) we cannot restrict
> pointers. But still, ~75% of Fortran performance is often acceptable!
> Another thing is you need to look at "scalability". How much of that
> extra runtime is constant due to differences between Cython and f2py?
> How much is variable due to the numerical kernel being faster in
> Fortran? Will differently sized problems give you the same overhead from
> using Cython? It often helps to plot a graph of the performance (mean
> and error bars) for various problem sizes, rather than benchmarking at
> one single point.
>
> Correctness is always more important than speed. That is one thing to
> consider too. With Cython we can begin with a tested Python prototype
> and optimize along the way, using the Python profiler to pinpoint where
> it matters the most. Python, NumPy and Cython will not win the world
> championship of being "fastest on the CPU" for simple numerical kernels,
> but that is not the idea either. Implementing complex algorithms in
> Fortran can be a PITA compared to Python. But Cython helps us in a
> stright forward way to speed up Python code and/or interface with C or
> C++. Fortran is only nice for helping us scientists to avoid the pointer
> arithmetics of C, but Cython's memoryviews do that too.
>
>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user