numpy.histogram is slow

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

numpy.histogram is slow

Chris Weisiger-2
My use case is displaying camera image data to the user as it is
streamed to us; this includes a histogram showing the distribution of
intensities in the image. Thus I have a 512x512 array of pixel data
(unsigned 16-bit ints) that I need to generate a histogram for.
Unfortunately, numpy.histogram takes a significant amount of time --
about 15ms per call. That's over 60% of the cost of showing an image
to the user, which means that I can't quite display data as quickly as
it comes in. So I'm looking for some faster option.

My searches turned up numpy.bincount, which is nice and zippy, but
unfortunately omits bins where the total count is 0. This makes sense
considering that otherwise it would always generate a length-N array
where N is the maximum value in the input, but it doesn't work for my
purposes. Are there any better options?

-Chris
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Zachary Pincus-2
On Oct 16, 2012, at 4:04 PM, Chris Weisiger wrote:

> My use case is displaying camera image data to the user as it is
> streamed to us; this includes a histogram showing the distribution of
> intensities in the image. Thus I have a 512x512 array of pixel data
> (unsigned 16-bit ints) that I need to generate a histogram for.
> Unfortunately, numpy.histogram takes a significant amount of time --
> about 15ms per call. That's over 60% of the cost of showing an image
> to the user, which means that I can't quite display data as quickly as
> it comes in. So I'm looking for some faster option.
>
> My searches turned up numpy.bincount, which is nice and zippy, but
> unfortunately omits bins where the total count is 0. This makes sense
> considering that otherwise it would always generate a length-N array
> where N is the maximum value in the input, but it doesn't work for my
> purposes. Are there any better options?

Uh, no? Bincount doesn't omit bins below the maximum value in the input, even if the count is zero:
In [205]: numpy.bincount([5,5,10])
Out[205]: array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1])

Perhaps you mean that bincount omits bins above the maximum value in the input, but below the maximum *possible* value of the input? That's what the minlength parameter was added for in numpy 1.6. So if you don't have this version, either upgrade, or manually zero-pad the bincount output:

bins = numpy.bincount([5,5,10])
padded = numpy.zeros(32, dtype=numpy.uint8)
padded[:len(bins)] = bins

That should be pretty quick.

Zach
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Chris Weisiger-2
On Tue, Oct 16, 2012 at 1:12 PM, Zachary Pincus <[hidden email]> wrote:
>
> Uh, no? Bincount doesn't omit bins below the maximum value in the input, even if the count is zero:
> In [205]: numpy.bincount([5,5,10])
> Out[205]: array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1])

Apologies, my test code was misleading me. You're right, and bincount
ought to be able to do what I want it to do. Sorry for wasting
everyone's time, and thanks for the correction.

-Chris
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Jerome Kieffer
In reply to this post by Chris Weisiger-2
On Tue, 16 Oct 2012 13:04:24 -0700
Chris Weisiger <[hidden email]> wrote:

> My use case is displaying camera image data to the user as it is
> streamed to us; this includes a histogram showing the distribution of
> intensities in the image. Thus I have a 512x512 array of pixel data
> (unsigned 16-bit ints) that I need to generate a histogram for.
> Unfortunately, numpy.histogram takes a significant amount of time --
> about 15ms per call. That's over 60% of the cost of showing an image
> to the user, which means that I can't quite display data as quickly as
> it comes in. So I'm looking for some faster option.

I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel.
It is much faster than the one provided by numpy:
4ms vs 25ms in your case on my computer
https://github.com/kif/pyFAI/blob/master/src/histogram.pyx

HTH
--
Jérôme Kieffer
On-Line Data analysis / Software Group
ISDD / ESRF
tel +33 476 882 445
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Chris Weisiger-2
On Thu, Oct 18, 2012 at 12:26 AM, Jerome Kieffer <[hidden email]> wrote:
>
> I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel.
> It is much faster than the one provided by numpy:
> 4ms vs 25ms in your case on my computer
> https://github.com/kif/pyFAI/blob/master/src/histogram.pyx

Interesting. Is there any particular reason why this code could not be
integrated into Numpy itself? A factor-of-6 improvement in speed on
multi-processor machines is significant.

>
> HTH
> --
> Jérôme Kieffer

-Chris
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

David Warde-Farley-3
On Thu, Oct 18, 2012 at 4:06 PM, Chris Weisiger <[hidden email]> wrote:

> On Thu, Oct 18, 2012 at 12:26 AM, Jerome Kieffer <[hidden email]> wrote:
>>
>> I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel.
>> It is much faster than the one provided by numpy:
>> 4ms vs 25ms in your case on my computer
>> https://github.com/kif/pyFAI/blob/master/src/histogram.pyx
>
> Interesting. Is there any particular reason why this code could not be
> integrated into Numpy itself? A factor-of-6 improvement in speed on
> multi-processor machines is significant.

I don't know if we have the build infrastructure to support OpenMP
robustly across platforms in NumPy yet. That said, it is something I'd
like to see eventually.

David
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Sturla Molden-2
On 19.10.2012 22:43, David Warde-Farley wrote:

> I don't know if we have the build infrastructure to support OpenMP
> robustly across platforms in NumPy yet. That said, it is something I'd
> like to see eventually.

But in the meantime, Cython code can use Python threads. They are pure
OS threads too.


Sturla




_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Sturla Molden-2
In reply to this post by Jerome Kieffer
On 18.10.2012 09:26, Jerome Kieffer wrote:

> I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel.
> It is much faster than the one provided by numpy:
> 4ms vs 25ms in your case on my computer
> https://github.com/kif/pyFAI/blob/master/src/histogram.pyx

Is there a reason why you set cdivision to True in a code that has no
integer division?

Also:

Cython prange scales badly unless you do a lot of work on each
iteration. That is, each iteration of a prange loop does a barrier
synchronization through an OpenMP flush. Don't use it the way you do
here. A Cython prange loop is not nearly as cheap as a C loop with
"#pragma omp parallel for". If you really want to use OpenMP, let your
Cython code call C code.

NumPy does not have a build system for OpenMP. Python threads works fine
too. It takes some more coding, but if you use closures in Cython it
will not be nearly as difficult as the "Java threads" coding style.


Sturla








_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Jerome Kieffer
On Mon, 22 Oct 2012 13:59:23 +0200
Sturla Molden <[hidden email]> wrote:

> On 18.10.2012 09:26, Jerome Kieffer wrote:
>
> > I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel.
> > It is much faster than the one provided by numpy:
> > 4ms vs 25ms in your case on my computer
> > https://github.com/kif/pyFAI/blob/master/src/histogram.pyx
>
> Is there a reason why you set cdivision to True in a code that has no
> integer division?

No... I would say this is legacy code. Basically I am (was) interested in the
(weighted histogram)/(unwgeighted histogram). This part has been
removed from the code.
I re-implemented histogram because I needed faster execution but the
implementation in Cython is not optimal, as you mentionned (large
storage because there are no atomic add in cython resulting in speed up
that don't scale). I also moved away from histogram as I needed more
precision.
 
> Cython prange scales badly unless you do a lot of work on each
> iteration. That is, each iteration of a prange loop does a barrier
> synchronization through an OpenMP flush. Don't use it the way you do
> here. A Cython prange loop is not nearly as cheap as a C loop with
> "#pragma omp parallel for". If you really want to use OpenMP, let your
> Cython code call C code.

I totally agree ... this is why I changed the algorithm to be able to
implement it in OpenCL (using pyopencl). OpenCL on the CPU is much
faster than cython and almost as dynamic as python when using pyopencl.

Cheers,

--
Jérôme Kieffer
Data analysis unit - ESRF
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: numpy.histogram is slow

Sturla Molden-2
On 23.10.2012 07:30, Jerome Kieffer wrote:

> I totally agree ... this is why I changed the algorithm to be able to
> implement it in OpenCL (using pyopencl). OpenCL on the CPU is much
> faster than cython and almost as dynamic as python when using pyopencl.

Yes, OpenCL is very cool, as are GLSL for OpenGL graphics. :)

As OpenCL and GLSL codes are plain text, compiled at runtime, we
preferably need a language that are good at text processing for using
them efficiently. And what that means, is that OpenCL and GLSL make it
possible for Python to beat the performance of C at number crunching and
3D computer graphics :-)

If I were to design a system like NumPy today, I would seriously
consider just using Python and OpenCL -- and no C.



Sturla
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user