Bottleneck

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Bottleneck

Keith Goodman
The naming saga [1] continues:

Nanny --> STAT --> DSNA --> Bottleneck

Bottleneck is a collection of fast, NumPy array functions written in Cython.

https://github.com/kwgoodman/bottleneck

I'm almost ready for a first preview release. If anyone could install
the package (directions in readme) and run the unit tests on windows
or mac or 32-bit linux, I'd be very interested in the results.

Future plans:

0.1 preview release
0.2 Add a Cython apply_along_axis function so only the 1d case needs
to be coded by hand
0.2 Template the code to expand dtype coverage, make maintainable
0.3 Add more functions

Some benchmarks:

>>> bn.benchit(verbose=False)
Bottleneck performance benchmark
    Bottleneck  0.1.0dev
    Numpy       1.5.1
    Scipy       0.8.0
    Speed is numpy (or scipy) time divided by Bottleneck time
    NaN means all NaNs
   Speed   Test                  Shape        dtype    NaN?
   2.4019  median(a, axis=-1)    (500,500)    float64
   2.2668  median(a, axis=-1)    (500,500)    float64  NaN
   4.1235  median(a, axis=-1)    (10000,)     float64
   4.3498  median(a, axis=-1)    (10000,)     float64  NaN
   9.8184  nanmax(a, axis=-1)    (500,500)    float64
   7.9157  nanmax(a, axis=-1)    (500,500)    float64  NaN
   9.2306  nanmax(a, axis=-1)    (10000,)     float64
   8.1635  nanmax(a, axis=-1)    (10000,)     float64  NaN
   6.7218  nanmin(a, axis=-1)    (500,500)    float64
   7.9112  nanmin(a, axis=-1)    (500,500)    float64  NaN
   6.4950  nanmin(a, axis=-1)    (10000,)     float64
   8.0791  nanmin(a, axis=-1)    (10000,)     float64  NaN
  12.3650  nanmean(a, axis=-1)   (500,500)    float64
  42.0738  nanmean(a, axis=-1)   (500,500)    float64  NaN
  12.2769  nanmean(a, axis=-1)   (10000,)     float64
  22.1285  nanmean(a, axis=-1)   (10000,)     float64  NaN
   9.5515  nanstd(a, axis=-1)    (500,500)    float64
  68.9192  nanstd(a, axis=-1)    (500,500)    float64  NaN
   9.2174  nanstd(a, axis=-1)    (10000,)     float64
  26.1753  nanstd(a, axis=-1)    (10000,)     float64  NaN

[1] http://mail.scipy.org/pipermail/scipy-user/2010-November/027553.html
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Chris Barker - NOAA Federal
On 11/30/10 11:50 AM, Keith Goodman wrote:
> Bottleneck is a collection of fast, NumPy array functions written in Cython.
>
> https://github.com/kwgoodman/bottleneck
>
> I'm almost ready for a first preview release. If anyone could install
> the package (directions in readme) and run the unit tests on windows
> or mac or 32-bit linux, I'd be very interested in the results.

OK -- tested on Mac OS-X 10.6, Intel, 32 bit Python 2.6.6

1) How necessary is scipy as a dependency? It'd be nice to have these
for numpy-only stuff. As  a rule, Scipy is way too inter-meshed as it is
-- I'd love to have more packages that you could easily install and use
without the whole scipy package.

-- off to get scipy installed on this system --

In [6]: scipy.__version__
Out[6]: '0.8.0'

In [2]: bottleneck.test()
Running unit tests for bottleneck
NumPy version 1.5.1
NumPy is installed in
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy
Python version 2.6.6 (r266:84374, Aug 31 2010, 11:00:51) [GCC 4.0.1
(Apple Inc. build 5493)]
nose version 0.11.4

WOW! a LOT of these warnings:

Warning: invalid value encountered in divide
(and similar)

But:

Ran 10 tests in 14.709s

OK
Out[7]: <nose.result.TextTestResult run=10 errors=0 failures=0>

So -- looking good!

-Chris

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Tue, Nov 30, 2010 at 1:34 PM, Christopher Barker
<[hidden email]> wrote:

> On 11/30/10 11:50 AM, Keith Goodman wrote:
>> Bottleneck is a collection of fast, NumPy array functions written in Cython.
>>
>> https://github.com/kwgoodman/bottleneck
>>
>> I'm almost ready for a first preview release. If anyone could install
>> the package (directions in readme) and run the unit tests on windows
>> or mac or 32-bit linux, I'd be very interested in the results.
>
> OK -- tested on Mac OS-X 10.6, Intel, 32 bit Python 2.6.6
>
> 1) How necessary is scipy as a dependency? It'd be nice to have these
> for numpy-only stuff. As  a rule, Scipy is way too inter-meshed as it is
> -- I'd love to have more packages that you could easily install and use
> without the whole scipy package.

I use SciPy for benchmarking (scipy.stats.nanmean, nanstd, etc). I
also unit test the moving window functions against a version that uses
scipy.ndimage. But I could make scipy optional in a later release.

> -- off to get scipy installed on this system --
>
> In [6]: scipy.__version__
> Out[6]: '0.8.0'
>
> In [2]: bottleneck.test()
> Running unit tests for bottleneck
> NumPy version 1.5.1
> NumPy is installed in
> /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/numpy
> Python version 2.6.6 (r266:84374, Aug 31 2010, 11:00:51) [GCC 4.0.1
> (Apple Inc. build 5493)]
> nose version 0.11.4
>
> WOW! a LOT of these warnings:
>
> Warning: invalid value encountered in divide
> (and similar)

Yeah, I started getting those too when I upgraded to numpy 1.5.1. Any ideas?

> But:
>
> Ran 10 tests in 14.709s
>
> OK
> Out[7]: <nose.result.TextTestResult run=10 errors=0 failures=0>
>
> So -- looking good!

Thank you so much. Mac OS X: check!
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Tue, Nov 30, 2010 at 1:49 PM, Keith Goodman <[hidden email]> wrote:

>> 1) How necessary is scipy as a dependency? It'd be nice to have these
>> for numpy-only stuff. As  a rule, Scipy is way too inter-meshed as it is
>> -- I'd love to have more packages that you could easily install and use
>> without the whole scipy package.
>
> I use SciPy for benchmarking (scipy.stats.nanmean, nanstd, etc). I
> also unit test the moving window functions against a version that uses
> scipy.ndimage. But I could make scipy optional in a later release.

Oh, wait. I unit test bn.nanstd etc against scipy.stats.nanstd etc. I
could pull those scipy functions into the project but I'd like to make
sure that Bottleneck gives the same result as whatever version of
scipy the user has installed so that they can be confident that
bn.nanstd is a drop-in replacement for scipy.stats.nanstd.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Chris Barker - NOAA Federal
On 11/30/10 1:57 PM, Keith Goodman wrote:
> Oh, wait. I unit test bn.nanstd etc against scipy.stats.nanstd etc. I
> could pull those scipy functions into the project but I'd like to make
> sure that Bottleneck gives the same result as whatever version of
> scipy the user has installed so that they can be confident that
> bn.nanstd is a drop-in replacement for scipy.stats.nanstd.

Fair enough -- but then scipy could be a dependency of only the tests
(which it may well be now).

I'll try to test on PPC soon.

>> WOW! a LOT of these warnings:
>>
>> Warning: invalid value encountered in divide
>> (and similar)
>
> Yeah, I started getting those too when I upgraded to numpy 1.5.1. Any ideas?

I think there was a post about it recently on the numpy list, but I
can't find it now. I suspect something has changed with the default
warnings settings.

-Chris




--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Chris Barker - NOAA Federal
On 11/30/10 2:30 PM, Christopher Barker wrote:
>>> WOW! a LOT of these warnings:

>> Yeah, I started getting those too when I upgraded to numpy 1.5.1. Any ideas?
>
> I think there was a post about it recently on the numpy list, but I
> can't find it now.

duoh! it was your question -- feel free to ignore me now...

-Chris



--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Robert Kern-2
In reply to this post by Chris Barker - NOAA Federal
On Tue, Nov 30, 2010 at 16:30, Christopher Barker <[hidden email]> wrote:
> On 11/30/10 1:57 PM, Keith Goodman wrote:

>>> WOW! a LOT of these warnings:
>>>
>>> Warning: invalid value encountered in divide
>>> (and similar)
>>
>> Yeah, I started getting those too when I upgraded to numpy 1.5.1. Any ideas?
>
> I think there was a post about it recently on the numpy list, but I
> can't find it now. I suspect something has changed with the default
> warnings settings.

Importing the ma subpackage used to have the unintentional side effect
of setting the error state to ignore these errors. This was fixed.
Unfortunately, the suggestion to change the intentional default to the
more sensible "warn" rather than "print" was lost in the shuffle.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Fabrice Silva-2
In reply to this post by Keith Goodman
Le mardi 30 novembre 2010 à 11:50 -0800, Keith Goodman a écrit :

> The naming saga [1] continues:
>
> Nanny --> STAT --> DSNA --> Bottleneck
> Some benchmarks:
>
> >>> bn.benchit(verbose=False)
> Bottleneck performance benchmark
>     Bottleneck  0.1.0dev
>     Numpy       1.5.1
>     Scipy       0.8.0
I wanted to test bottleneck on a *really* slow machine (DELL C610,
866MHz, 256Mb RAM) running on Debian unstable but numpy and scipy
versions are not the newest (Numpy 1.4.1 and Scipy 0.7.2) and prevents
using scipy.nanstd as you are using it, see logs.
Benchmark even fails due to error raising in this function.

--
Fabrice Silva

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

bn_install.log (5K) Download Attachment
bn_test.log (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Tue, Nov 30, 2010 at 3:49 PM, Fabrice Silva <[hidden email]> wrote:

> Le mardi 30 novembre 2010 à 11:50 -0800, Keith Goodman a écrit :
>> The naming saga [1] continues:
>>
>> Nanny --> STAT --> DSNA --> Bottleneck
>> Some benchmarks:
>>
>> >>> bn.benchit(verbose=False)
>> Bottleneck performance benchmark
>>     Bottleneck  0.1.0dev
>>     Numpy       1.5.1
>>     Scipy       0.8.0
>
> I wanted to test bottleneck on a *really* slow machine (DELL C610,
> 866MHz, 256Mb RAM) running on Debian unstable but numpy and scipy
> versions are not the newest (Numpy 1.4.1 and Scipy 0.7.2) and prevents
> using scipy.nanstd as you are using it, see logs.
> Benchmark even fails due to error raising in this function.

That's a great test!

Could it be that older version of scipy.stats.nanstd can't handle
negative axes? In case that's the problem I added ndim to negative
axes before passing to scipy.stats.nanstd in the latest commit. Care
to try it?
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Fabrice Silva-2
Le mardi 30 novembre 2010 à 16:13 -0800, Keith Goodman a écrit :
> That's a great test!
>
> Could it be that older version of scipy.stats.nanstd can't handle
> negative axes? In case that's the problem I added ndim to negative
> axes before passing to scipy.stats.nanstd in the latest commit. Care
> to try it?
       
        In [12]: sp.nanstd(a, axis=-1)
        ---------------------------------------------------------------------------
        ValueError                                Traceback (most recent call last)
        /home/fab/<ipython console> in <module>()
        /usr/lib/python2.6/dist-packages/scipy/stats/stats.pyc in nanstd(x, axis, bias)
            302     if axis!=0:
            303         shape = np.arange(x.ndim).tolist()
        --> 304         shape.remove(axis)
            305         shape.insert(0,axis)
            306         x = x.transpose(tuple(shape))
       
        ValueError: list.remove(x): x not in list


In fact -1 is not in the generated list (l303)

See http://projects.scipy.org/scipy/ticket/1161 (closed), but the fix
did not reach my machine by now...

--
Fabrice Silva

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Tue, Nov 30, 2010 at 5:09 PM, Fabrice Silva <[hidden email]> wrote:

> Le mardi 30 novembre 2010 à 16:13 -0800, Keith Goodman a écrit :
>> That's a great test!
>>
>> Could it be that older version of scipy.stats.nanstd can't handle
>> negative axes? In case that's the problem I added ndim to negative
>> axes before passing to scipy.stats.nanstd in the latest commit. Care
>> to try it?
>
>        In [12]: sp.nanstd(a, axis=-1)
>        ---------------------------------------------------------------------------
>        ValueError                                Traceback (most recent call last)
>        /home/fab/<ipython console> in <module>()
>        /usr/lib/python2.6/dist-packages/scipy/stats/stats.pyc in nanstd(x, axis, bias)
>            302     if axis!=0:
>            303         shape = np.arange(x.ndim).tolist()
>        --> 304         shape.remove(axis)
>            305         shape.insert(0,axis)
>            306         x = x.transpose(tuple(shape))
>
>        ValueError: list.remove(x): x not in list
>
>
> In fact -1 is not in the generated list (l303)
>
> See http://projects.scipy.org/scipy/ticket/1161 (closed), but the fix
> did not reach my machine by now...

Ha! I filed that ticket. With the latest commit of Bottleneck, I no
longer pass negative indices to scipy.stats.nanstd. But I bet your old
version of scipy.stats.nanstd chokes on axis=None too. I could ravel
and set axis to 0 for axis=None input. If you find that works, I can
make the change.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Fabrice Silva-2
Le mardi 30 novembre 2010 à 17:24 -0800, Keith Goodman a écrit :
> Ha! I filed that ticket. With the latest commit of Bottleneck, I no
> longer pass negative indices to scipy.stats.nanstd. But I bet your old
> version of scipy.stats.nanstd chokes on axis=None too. I could ravel
> and set axis to 0 for axis=None input. If you find that works, I can
> make the change.

With the (almost) last commit, test is ok (quite, one fails at high
precision), but some bench still need to be changed

--
Fabrice Silva

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

bn_testbench.log (4K) Download Attachment
grep_res.log (128 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Tue, Nov 30, 2010 at 5:42 PM, Fabrice Silva <[hidden email]> wrote:
> Le mardi 30 novembre 2010 à 17:24 -0800, Keith Goodman a écrit :
>> Ha! I filed that ticket. With the latest commit of Bottleneck, I no
>> longer pass negative indices to scipy.stats.nanstd. But I bet your old
>> version of scipy.stats.nanstd chokes on axis=None too. I could ravel
>> and set axis to 0 for axis=None input. If you find that works, I can
>> make the change.
>
> With the (almost) last commit, test is ok (quite, one fails at high
> precision), but some bench still need to be changed

OK, another commit. I hope this one works. Thank you for all the testing.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Fabrice Silva-2
Le mardi 30 novembre 2010 à 17:55 -0800, Keith Goodman a écrit :
> OK, another commit. I hope this one works. Thank you for all the testing.

I admit I don't see any change in tests and bench.
By the way, axis=None does work on scipy 0.7.2.

--
Fabrice Silva

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

bn_testbench2.log (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Tue, Nov 30, 2010 at 6:38 PM, Fabrice Silva <[hidden email]> wrote:
> Le mardi 30 novembre 2010 à 17:55 -0800, Keith Goodman a écrit :
>> OK, another commit. I hope this one works. Thank you for all the testing.
>
> I admit I don't see any change in tests and bench.
> By the way, axis=None does work on scipy 0.7.2.

I admit defeat.

I made another commit. Unit tests should pass. Bench will not pass
(not fair to benchmark against scipy code if I were to wrap
scipy.stats.nanstd in a python layer to take care of negative axes
etc.)

I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy
1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new
project.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

T J-4
On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman <[hidden email]> wrote:
> I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy
> 1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new
> project.

If SciPy is only used in the benchmarks/tests, then why not make it an
optional benchmark/test that runs only if SciPy is present?
nose.SkipTest should be useful here.  I frequently run software on
machines that only have NumPy installed.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Wed, Dec 1, 2010 at 3:49 PM, T J <[hidden email]> wrote:
> On Tue, Nov 30, 2010 at 7:04 PM, Keith Goodman <[hidden email]> wrote:
>> I bumped the Bottleneck requirements from "NumPy, SciPy" to "NumPy
>> 1.5.1+, SciPy 0.8.0+". I think that is fair to do for a brand new
>> project.
>
> If SciPy is only used in the benchmarks/tests, then why not make it an
> optional benchmark/test that runs only if SciPy is present?
> nose.SkipTest should be useful here.  I frequently run software on
> machines that only have NumPy installed.

Seems like a strange discussion to have on the scipy list :)

I don't want to have a hole in my unit test coverage. But I could copy
over the nan functions in scipy stats. And I guess the benchmark could
use those too. And then skip moving window benchmarks against
scipy.ndimage for those who don't have scipy installed.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Chris Barker - NOAA Federal
On 12/1/10 4:09 PM, Keith Goodman wrote:
>> I frequently run software on
>> machines that only have NumPy installed.
>
> Seems like a strange discussion to have on the scipy list :)

True -- and yet I didn't have scipy on this machine yet, either...

> I don't want to have a hole in my unit test coverage. But I could copy
> over the nan functions in scipy stats. And I guess the benchmark could
> use those too. And then skip moving window benchmarks against
> scipy.ndimage for those who don't have scipy installed.

I'd vote to have unit tests that don't require scipy, but I think it's
fine that the benchmarks do -- that's kind of the point of them --
comparing bottleneck to the raw scipy functions.

-Chris



--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Bottleneck

Keith Goodman
On Wed, Dec 1, 2010 at 4:19 PM, Christopher Barker
<[hidden email]> wrote:

> On 12/1/10 4:09 PM, Keith Goodman wrote:
>>> I frequently run software on
>>> machines that only have NumPy installed.
>>
>> Seems like a strange discussion to have on the scipy list :)
>
> True -- and yet I didn't have scipy on this machine yet, either...
>
>> I don't want to have a hole in my unit test coverage. But I could copy
>> over the nan functions in scipy stats. And I guess the benchmark could
>> use those too. And then skip moving window benchmarks against
>> scipy.ndimage for those who don't have scipy installed.
>
> I'd vote to have unit tests that don't require scipy, but I think it's
> fine that the benchmarks do -- that's kind of the point of them --
> comparing bottleneck to the raw scipy functions.

Well, now I have a most requested feature. OK, I'll do it for 0.2.
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user