My use case is displaying camera image data to the user as it is
streamed to us; this includes a histogram showing the distribution of intensities in the image. Thus I have a 512x512 array of pixel data (unsigned 16-bit ints) that I need to generate a histogram for. Unfortunately, numpy.histogram takes a significant amount of time -- about 15ms per call. That's over 60% of the cost of showing an image to the user, which means that I can't quite display data as quickly as it comes in. So I'm looking for some faster option. My searches turned up numpy.bincount, which is nice and zippy, but unfortunately omits bins where the total count is 0. This makes sense considering that otherwise it would always generate a length-N array where N is the maximum value in the input, but it doesn't work for my purposes. Are there any better options? -Chris _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Oct 16, 2012, at 4:04 PM, Chris Weisiger wrote:
> My use case is displaying camera image data to the user as it is > streamed to us; this includes a histogram showing the distribution of > intensities in the image. Thus I have a 512x512 array of pixel data > (unsigned 16-bit ints) that I need to generate a histogram for. > Unfortunately, numpy.histogram takes a significant amount of time -- > about 15ms per call. That's over 60% of the cost of showing an image > to the user, which means that I can't quite display data as quickly as > it comes in. So I'm looking for some faster option. > > My searches turned up numpy.bincount, which is nice and zippy, but > unfortunately omits bins where the total count is 0. This makes sense > considering that otherwise it would always generate a length-N array > where N is the maximum value in the input, but it doesn't work for my > purposes. Are there any better options? Uh, no? Bincount doesn't omit bins below the maximum value in the input, even if the count is zero: In [205]: numpy.bincount([5,5,10]) Out[205]: array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1]) Perhaps you mean that bincount omits bins above the maximum value in the input, but below the maximum *possible* value of the input? That's what the minlength parameter was added for in numpy 1.6. So if you don't have this version, either upgrade, or manually zero-pad the bincount output: bins = numpy.bincount([5,5,10]) padded = numpy.zeros(32, dtype=numpy.uint8) padded[:len(bins)] = bins That should be pretty quick. Zach _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Oct 16, 2012 at 1:12 PM, Zachary Pincus <[hidden email]> wrote:
> > Uh, no? Bincount doesn't omit bins below the maximum value in the input, even if the count is zero: > In [205]: numpy.bincount([5,5,10]) > Out[205]: array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1]) Apologies, my test code was misleading me. You're right, and bincount ought to be able to do what I want it to do. Sorry for wasting everyone's time, and thanks for the correction. -Chris _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Chris Weisiger-2
On Tue, 16 Oct 2012 13:04:24 -0700
Chris Weisiger <[hidden email]> wrote: > My use case is displaying camera image data to the user as it is > streamed to us; this includes a histogram showing the distribution of > intensities in the image. Thus I have a 512x512 array of pixel data > (unsigned 16-bit ints) that I need to generate a histogram for. > Unfortunately, numpy.histogram takes a significant amount of time -- > about 15ms per call. That's over 60% of the cost of showing an image > to the user, which means that I can't quite display data as quickly as > it comes in. So I'm looking for some faster option. I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel. It is much faster than the one provided by numpy: 4ms vs 25ms in your case on my computer https://github.com/kif/pyFAI/blob/master/src/histogram.pyx HTH -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Thu, Oct 18, 2012 at 12:26 AM, Jerome Kieffer <[hidden email]> wrote:
> > I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel. > It is much faster than the one provided by numpy: > 4ms vs 25ms in your case on my computer > https://github.com/kif/pyFAI/blob/master/src/histogram.pyx Interesting. Is there any particular reason why this code could not be integrated into Numpy itself? A factor-of-6 improvement in speed on multi-processor machines is significant. > > HTH > -- > Jérôme Kieffer -Chris _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Thu, Oct 18, 2012 at 4:06 PM, Chris Weisiger <[hidden email]> wrote:
> On Thu, Oct 18, 2012 at 12:26 AM, Jerome Kieffer <[hidden email]> wrote: >> >> I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel. >> It is much faster than the one provided by numpy: >> 4ms vs 25ms in your case on my computer >> https://github.com/kif/pyFAI/blob/master/src/histogram.pyx > > Interesting. Is there any particular reason why this code could not be > integrated into Numpy itself? A factor-of-6 improvement in speed on > multi-processor machines is significant. I don't know if we have the build infrastructure to support OpenMP robustly across platforms in NumPy yet. That said, it is something I'd like to see eventually. David _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On 19.10.2012 22:43, David Warde-Farley wrote:
> I don't know if we have the build infrastructure to support OpenMP > robustly across platforms in NumPy yet. That said, it is something I'd > like to see eventually. But in the meantime, Cython code can use Python threads. They are pure OS threads too. Sturla _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Jerome Kieffer
On 18.10.2012 09:26, Jerome Kieffer wrote:
> I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel. > It is much faster than the one provided by numpy: > 4ms vs 25ms in your case on my computer > https://github.com/kif/pyFAI/blob/master/src/histogram.pyx Is there a reason why you set cdivision to True in a code that has no integer division? Also: Cython prange scales badly unless you do a lot of work on each iteration. That is, each iteration of a prange loop does a barrier synchronization through an OpenMP flush. Don't use it the way you do here. A Cython prange loop is not nearly as cheap as a C loop with "#pragma omp parallel for". If you really want to use OpenMP, let your Cython code call C code. NumPy does not have a build system for OpenMP. Python threads works fine too. It takes some more coding, but if you use closures in Cython it will not be nearly as difficult as the "Java threads" coding style. Sturla _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Mon, 22 Oct 2012 13:59:23 +0200
Sturla Molden <[hidden email]> wrote: > On 18.10.2012 09:26, Jerome Kieffer wrote: > > > I implemented a 1D and 2D histogram, weighted and unweighted using cython (>=0.17) in parallel. > > It is much faster than the one provided by numpy: > > 4ms vs 25ms in your case on my computer > > https://github.com/kif/pyFAI/blob/master/src/histogram.pyx > > Is there a reason why you set cdivision to True in a code that has no > integer division? No... I would say this is legacy code. Basically I am (was) interested in the (weighted histogram)/(unwgeighted histogram). This part has been removed from the code. I re-implemented histogram because I needed faster execution but the implementation in Cython is not optimal, as you mentionned (large storage because there are no atomic add in cython resulting in speed up that don't scale). I also moved away from histogram as I needed more precision. > Cython prange scales badly unless you do a lot of work on each > iteration. That is, each iteration of a prange loop does a barrier > synchronization through an OpenMP flush. Don't use it the way you do > here. A Cython prange loop is not nearly as cheap as a C loop with > "#pragma omp parallel for". If you really want to use OpenMP, let your > Cython code call C code. I totally agree ... this is why I changed the algorithm to be able to implement it in OpenCL (using pyopencl). OpenCL on the CPU is much faster than cython and almost as dynamic as python when using pyopencl. Cheers, -- Jérôme Kieffer Data analysis unit - ESRF _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On 23.10.2012 07:30, Jerome Kieffer wrote:
> I totally agree ... this is why I changed the algorithm to be able to > implement it in OpenCL (using pyopencl). OpenCL on the CPU is much > faster than cython and almost as dynamic as python when using pyopencl. Yes, OpenCL is very cool, as are GLSL for OpenGL graphics. :) As OpenCL and GLSL codes are plain text, compiled at runtime, we preferably need a language that are good at text processing for using them efficiently. And what that means, is that OpenCL and GLSL make it possible for Python to beat the performance of C at number crunching and 3D computer graphics :-) If I were to design a system like NumPy today, I would seriously consider just using Python and OpenCL -- and no C. Sturla _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |