[SciPy-User] Should one NOT start with Cython first?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Should one NOT start with Cython first?

Brian Merchant
The current answer suggests that one should not write as much of the program as possible using Cython. Rather, one should start with Python, and then only use Cython for bottlenecks.

Is this the right mentality to have when using Cython for scientific projects? Why not start Cython first from the get-go?

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Jason Moore
Most code in a typical high level scientific projects is not CPU intensive, so it is often best to have a mixture of Python and Cython to maximize both development speed and performance. Most people tend to follow the pattern: 1) write in Python 2) profile 3) rewrite slow bits in with Cython. But that doesn't mean you have to. You can write everything in Cython if you want. One disadvantage to that is that you loose the rapid iterative development ability you get with pure Python. If only a small portion of your code is Cython you can usually get by with compiling it only occasionally as opposed to every time you make a change. One potential reason to write it all in Cython is if you want to share binary forms of your code instead of Python source code.


Jason
moorepants.info
+01 530-601-9791

On Mon, Mar 9, 2015 at 10:27 PM, Brian Merchant <[hidden email]> wrote:
The current answer suggests that one should not write as much of the program as possible using Cython. Rather, one should start with Python, and then only use Cython for bottlenecks.

Is this the right mentality to have when using Cython for scientific projects? Why not start Cython first from the get-go?

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Sturla Molden-3
In reply to this post by Brian Merchant
"Premature optimization is the root of all evil in computer programming."

Here are my two cents:

- Most code in scientific programming is not CPU bound. You don't get a
faster harddrive or faster network connection from using Cython. This
accounts for the majority (often 80 to 90 %) of the code we write.

- Even if the code is CPU bound the bottleneck might be in a library like
BLAS or LAPACK, for which using Cython might win you nothing.

- Even in the cases where Cython helps, it might not be worth the effort:
Python code which is fast enough is still fast enough, even if Cython code
is faster.

- In scientific computing we usually care more about correctness than
speed. Speed is never important befre you have code which is actually
correct.

- Optimizing NumPy code with Numba is often easier than using Cython.

- When programming Cython I often find I want to use C or Fortran instead.

- Bottlenecks are often obscured. Use a profiler to locate them.

- The best way to deal with a bottleneck is very often to use a better
algorithm ot datastructure, not move from Python to C or Cython.

So should you use Cython from the start? If you are using Cython to
optimize for speed, the answer in my opinion is "no". Make a working Python
prototype and then use a profiler to find the bottlenecks. If you are using
Cython to wrap native code (C, C++, Fortran) then just go ahead and use it
from the beginning.


Regards,
Sturla



Brian Merchant <[hidden email]> wrote:

> Recently, this question was asked on stackoverflow:Cython: when using typed
> memoryviews, are Cython users supposed to implement their own library of
> “vector” functions?
> <<a
> href="http://stackoverflow.com/questions/28948175/cython-when-using-typed-memoryviews-are-cython-users-supposed-to-implement-the/28948871#28948871">http://stackoverflow.com/questions/28948175/cython-when-using-typed-memoryviews-are-cython-users-supposed-to-implement-the/28948871#28948871</a>>
>
> The current answer suggests that one should not write as much of the
> program as possible using Cython. Rather, one should start with Python, and
> then only use Cython for bottlenecks.
>
> Is this the right mentality to have when using Cython for scientific
> projects? Why not start Cython first from the get-go?
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> <a
> href="http://mail.scipy.org/mailman/listinfo/scipy-user">http://mail.scipy.org/mailman/listinfo/scipy-user</a>

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Sturla Molden-3
Sturla Molden <[hidden email]> wrote:


> - Even in the cases where Cython helps, it might not be worth the effort:
> Python code which is fast enough is still fast enough, even if Cython code
> is faster.
>
> - In scientific computing we usually care more about correctness than
> speed. Speed is never important befre you have code which is actually
> correct


"Fast" code which is incorrect is not better than "slow" code which is
correct, on the countrary. We prefer "slow and correct" code to "fast but
incorrect" code. As a consequence of this, you will often be better off
spending your time writing rigorous test cases than optimizing for speed.
And your attitude when writing tests should be to do your best to prove
that your code does not work. If you can prove your code wrong it is a far
more important investment of your own resources than spening time on
micro-optimizations.

Another thing which is special for scientific computing is a rather
peculiar meaning of "fast" and "slow": It is not the time to market a
product, nor the CPU time. Rather it is the time you spend working on the
project, from its planning until your paper is published. It might not be
that C is "fast" at all in this context. It is better to spend a short time
writing code and have the computer work for a week, rather than spend a
week writing code and have the computer work for a short time. This is
because a lot of the code we write in science is customized for a
particular project, and will only be used once. What can be working against
you here is your own pride. It is often against our pride to just accept
atrociously slow code and use it, rather than try to improve on it. But put
your pride away and focus on your real goal.

Sturla

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Chris Barker - NOAA Federal
On Tue, Mar 10, 2015 at 9:30 AM, Sturla Molden <[hidden email]> wrote:
 We prefer "slow and correct" code to "fast but
incorrect" code. As a consequence of this, you will often be better off
spending your time writing rigorous test cases than optimizing for speed.

And you'll really want those tests if/when you do optimize for speed.

Rather it is the time you spend working on the
project, from its planning until your paper is published.

Exactly -- or even if you aren't an academic, development time is far more valuable than run time.

They key here is that Cython is really great, and can be very easy to use -- but it is not at all cost free. The edit-compile-test cycle is much slower, and it's a lot harder to debug than Python -- so you really don't want to use it unless there is a real gain to to doing so.

-Chris




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Brian Merchant
Thanks so much everyone for the discussion.

The recommendations for stuff like Numba and "just stick with Python at first" came in really useful as it caused me to bump into this (while learning about Numba): https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/


I have promised myself to never start writing "Cython-first" in mind again. I'll stick to Python, then do profiling (including line based, if useful), use Numba when I have to, and use Cython to wrap C code if I must.

On Fri, Mar 13, 2015 at 11:27 AM, Chris Barker <[hidden email]> wrote:
On Tue, Mar 10, 2015 at 9:30 AM, Sturla Molden <[hidden email]> wrote:
 We prefer "slow and correct" code to "fast but
incorrect" code. As a consequence of this, you will often be better off
spending your time writing rigorous test cases than optimizing for speed.

And you'll really want those tests if/when you do optimize for speed.

Rather it is the time you spend working on the
project, from its planning until your paper is published.

Exactly -- or even if you aren't an academic, development time is far more valuable than run time.

They key here is that Cython is really great, and can be very easy to use -- but it is not at all cost free. The edit-compile-test cycle is much slower, and it's a lot harder to debug than Python -- so you really don't want to use it unless there is a real gain to to doing so.

-Chris




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            <a href="tel:%28206%29%20526-6959" value="+12065266959" target="_blank">(206) 526-6959   voice
7600 Sand Point Way NE   <a href="tel:%28206%29%20526-6329" value="+12065266329" target="_blank">(206) 526-6329   fax
Seattle, WA  98115       <a href="tel:%28206%29%20526-6317" value="+12065266317" target="_blank">(206) 526-6317   main reception

[hidden email]

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Scott High
Hey All,

First:
I have promised myself to never start writing "Cython-first" in mind again.
I agree, good choice.

Beyond that, I have a few comments based on my experience using Python/Cython for research in numerical methods.

- Most code in scientific programming is not CPU bound. You don't get a
faster harddrive or faster network connection from using Cython. This
accounts for the majority (often 80 to 90 %) of the code we write.
This is often true, but is not necessarily an argument again using Cython. (You only mentioned hard drives and network connections, but hitting any level in the memory hierarchy that is not cache can wreck performance). If you carefully read code that uses numpy array operations (especially slicing) you will notice that data will often be accessed multiple times in the same mathematical operation. The largest speed ups I have gotten with Cython are not from reducing the number of flops, but in using C style loops to minimize the number of passes through memory. For the computer scientists in the room: Cython can often allow you to better utilize cache. For me a good rule has been: usually don't convert to Cython to reduce the number of flops, but consider converting to better manage memory. There are some serious weasel words in that rule, mostly because it is very hard to judge potential speed ups by inspecting code, even for experts.   

They key here is that Cython is really great, and can be very easy to use -- but it is not at all cost free. The edit-compile-test cycle is much slower, and it's a lot harder to debug than Python -- so you really don't want to use it unless there is a real gain to to doing so.
If you intend to write the majority of your program in Cython the compile time can really hurt you, and I am not crazy about the large amount of unreadable code generated. On the other hand, I have found that if you only intend to convert a small number of functions then the IPython Cython cell magic can make development nearly as quick as it would be with pure Python. As far as debugging goes, it is a pain. No way around that one.

- Optimizing NumPy code with Numba is often easier than using Cython.
I have used Numba and was very much impressed. The problem is that installation can be a nightmare. Depending on your users system they may have to install LLVM (I had to build it from source). This is a very heavy duty dependency, do not discount it.

- When programming Cython I often find I want to use C or Fortran instead.
I find that when 'optimizing' Cython I often want to use C (not Fortran!) instead. Usually after converting a function to Cython I find that its performance is good enough. When it is not I skip the confusing optimization tips for Cython and go straight to C. This is a straightforward process when you already have things expressed with c types.

- Bottlenecks are often obscured. Use a profiler to locate them.
Yes. I am a huge fan of the IPython line profiler magic.

- The best way to deal with a bottleneck is very often to use a better
algorithm ot datastructure, not move from Python to C or Cython.
Yes.

 

On Fri, Mar 13, 2015 at 9:19 PM, Brian Merchant <[hidden email]> wrote:
Thanks so much everyone for the discussion.

The recommendations for stuff like Numba and "just stick with Python at first" came in really useful as it caused me to bump into this (while learning about Numba): https://jakevdp.github.io/blog/2015/02/24/optimizing-python-with-numpy-and-numba/


I have promised myself to never start writing "Cython-first" in mind again. I'll stick to Python, then do profiling (including line based, if useful), use Numba when I have to, and use Cython to wrap C code if I must.

On Fri, Mar 13, 2015 at 11:27 AM, Chris Barker <[hidden email]> wrote:
On Tue, Mar 10, 2015 at 9:30 AM, Sturla Molden <[hidden email]> wrote:
 We prefer "slow and correct" code to "fast but
incorrect" code. As a consequence of this, you will often be better off
spending your time writing rigorous test cases than optimizing for speed.

And you'll really want those tests if/when you do optimize for speed.

Rather it is the time you spend working on the
project, from its planning until your paper is published.

Exactly -- or even if you aren't an academic, development time is far more valuable than run time.

They key here is that Cython is really great, and can be very easy to use -- but it is not at all cost free. The edit-compile-test cycle is much slower, and it's a lot harder to debug than Python -- so you really don't want to use it unless there is a real gain to to doing so.

-Chris




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            <a href="tel:%28206%29%20526-6959" value="+12065266959" target="_blank">(206) 526-6959   voice
7600 Sand Point Way NE   <a href="tel:%28206%29%20526-6329" value="+12065266329" target="_blank">(206) 526-6329   fax
Seattle, WA  98115       <a href="tel:%28206%29%20526-6317" value="+12065266317" target="_blank">(206) 526-6317   main reception

[hidden email]

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user




--
-Scott High

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Matthew Brett
Hi,

On Fri, Mar 13, 2015 at 10:25 PM, Scott High <[hidden email]> wrote:

> Hey All,
>
> First:
>>
>> I have promised myself to never start writing "Cython-first" in mind
>> again.
>
> I agree, good choice.
>
> Beyond that, I have a few comments based on my experience using
> Python/Cython for research in numerical methods.
>
>> - Most code in scientific programming is not CPU bound. You don't get a
>> faster harddrive or faster network connection from using Cython. This
>> accounts for the majority (often 80 to 90 %) of the code we write.
>
> This is often true, but is not necessarily an argument again using Cython.
> (You only mentioned hard drives and network connections, but hitting any
> level in the memory hierarchy that is not cache can wreck performance). If
> you carefully read code that uses numpy array operations (especially
> slicing) you will notice that data will often be accessed multiple times in
> the same mathematical operation. The largest speed ups I have gotten with
> Cython are not from reducing the number of flops, but in using C style loops
> to minimize the number of passes through memory. For the computer scientists
> in the room: Cython can often allow you to better utilize cache. For me a
> good rule has been: usually don't convert to Cython to reduce the number of
> flops, but consider converting to better manage memory. There are some
> serious weasel words in that rule, mostly because it is very hard to judge
> potential speed ups by inspecting code, even for experts.
>
>> They key here is that Cython is really great, and can be very easy to use
>> -- but it is not at all cost free. The edit-compile-test cycle is much
>> slower, and it's a lot harder to debug than Python -- so you really don't
>> want to use it unless there is a real gain to to doing so.
>
> If you intend to write the majority of your program in Cython the compile
> time can really hurt you, and I am not crazy about the large amount of
> unreadable code generated. On the other hand, I have found that if you only
> intend to convert a small number of functions then the IPython Cython cell
> magic can make development nearly as quick as it would be with pure Python.
> As far as debugging goes, it is a pain. No way around that one.
>
>> - Optimizing NumPy code with Numba is often easier than using Cython.
>
> I have used Numba and was very much impressed. The problem is that
> installation can be a nightmare. Depending on your users system they may
> have to install LLVM (I had to build it from source). This is a very heavy
> duty dependency, do not discount it.

Yes, I would think hard before committing to numba.  Maybe because of
dependencies, or some other reason, I believe it is fair to say that
Cython is in much wider use than numba.  For example, I don't know if
any use of numba in the packages I use (or write) but Cython is very
common.

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

George Nurser-4
I've found numba very good, and the mailing list helpful.
It's new, and under pretty active development, so I guess it's not surprising that it's presently less used than cython. But there's e.g. Stephan Hoyer's numbagg
https://github.com/shoyer/numbagg

To build it yourself may be a pain, but, it is very easy to install it with
conda install numba
if you have the anaconda distribution, and, on the mac, with macports,
sudo port install py34-numba (or py27-numba)
works like a charm. Both macports and conda give up to date versions.

cheers, George Nurser.

On 15 March 2015 at 00:24, Matthew Brett <[hidden email]> wrote:
Hi,

On Fri, Mar 13, 2015 at 10:25 PM, Scott High <[hidden email]> wrote:
> Hey All,
>
> First:
>>
>> I have promised myself to never start writing "Cython-first" in mind
>> again.
>
> I agree, good choice.
>
> Beyond that, I have a few comments based on my experience using
> Python/Cython for research in numerical methods.
>
>> - Most code in scientific programming is not CPU bound. You don't get a
>> faster harddrive or faster network connection from using Cython. This
>> accounts for the majority (often 80 to 90 %) of the code we write.
>
> This is often true, but is not necessarily an argument again using Cython.
> (You only mentioned hard drives and network connections, but hitting any
> level in the memory hierarchy that is not cache can wreck performance). If
> you carefully read code that uses numpy array operations (especially
> slicing) you will notice that data will often be accessed multiple times in
> the same mathematical operation. The largest speed ups I have gotten with
> Cython are not from reducing the number of flops, but in using C style loops
> to minimize the number of passes through memory. For the computer scientists
> in the room: Cython can often allow you to better utilize cache. For me a
> good rule has been: usually don't convert to Cython to reduce the number of
> flops, but consider converting to better manage memory. There are some
> serious weasel words in that rule, mostly because it is very hard to judge
> potential speed ups by inspecting code, even for experts.
>
>> They key here is that Cython is really great, and can be very easy to use
>> -- but it is not at all cost free. The edit-compile-test cycle is much
>> slower, and it's a lot harder to debug than Python -- so you really don't
>> want to use it unless there is a real gain to to doing so.
>
> If you intend to write the majority of your program in Cython the compile
> time can really hurt you, and I am not crazy about the large amount of
> unreadable code generated. On the other hand, I have found that if you only
> intend to convert a small number of functions then the IPython Cython cell
> magic can make development nearly as quick as it would be with pure Python.
> As far as debugging goes, it is a pain. No way around that one.
>
>> - Optimizing NumPy code with Numba is often easier than using Cython.
>
> I have used Numba and was very much impressed. The problem is that
> installation can be a nightmare. Depending on your users system they may
> have to install LLVM (I had to build it from source). This is a very heavy
> duty dependency, do not discount it.

Yes, I would think hard before committing to numba.  Maybe because of
dependencies, or some other reason, I believe it is fair to say that
Cython is in much wider use than numba.  For example, I don't know if
any use of numba in the packages I use (or write) but Cython is very
common.

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Sturla Molden-3
On 15/03/15 18:11, George Nurser wrote:

> To build it yourself may be a pain, but, it is very easy to install it with
> conda install numba


Most scientists will use Anaconda or Enthought Canopy. With Anaconda we
do as you said (conda install numba), with Canopy we just click on it in
the graphical package manager.

The rest can install Numba from PyPI:

$ pip install numba

It is that hard to install...

Maybe it was a PITA to build at some point in history, but that is all
in the past.



Sturla

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Should one NOT start with Cython first?

Uwe Fechner
Well, even though I like Numba, installing it is a nightmare.

I wrote a lot of code for Numba 0.11, which is not compatible to newer
versions: In newer version object oriented code (jit compiled classes)
are not longer supported.

pip install numba does not work easily, not even for the current version
of numba: You need to have a specific version of llvm and llvmlite installed
first, which is not easily available for Ubuntu 12.04, which I am using.

We had to write our own installer just to be able to continue to install
Numba 0.11, which is needed for our current code base (see:
https://bitbucket.org/ufechner/freekitesim ).

Am 17.03.2015 um 14:28 schrieb Sturla Molden:

> On 15/03/15 18:11, George Nurser wrote:
>
>> To build it yourself may be a pain, but, it is very easy to install it with
>> conda install numba
>
> Most scientists will use Anaconda or Enthought Canopy. With Anaconda we
> do as you said (conda install numba), with Canopy we just click on it in
> the graphical package manager.
>
> The rest can install Numba from PyPI:
>
> $ pip install numba
>
> It is that hard to install...
>
> Maybe it was a PITA to build at some point in history, but that is all
> in the past.
>
>
>
> Sturla
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user


--
---------------------------------------------
Uwe Fechner, M.Sc.
Delft University of Technology
Faculty of Aerospace Engineering/ Wind Energy
Kluyverweg 1,
2629 HS Delft, The Netherlands
Phone: +31-15-27-88902

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user