Dear Scipy-Users,
When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply. The data-source is a square matrix with N=1470, which is originally created by the constructor of scipy.sparse.csc_matrix from a dense matrix (You can recreate it by converting the matrix first to dense and then back to csc). I compute the dot-product of the matrix with itself transposed and some of the dot products (28 exactly) are different from the np.dot by a factor of 10^5. I saved the csc-matrix to a npz-file and attached it. A code sample to reproduce the effect is here: sparsecsc = np.load('sparsecsc.npz') K = sparsecsc['K'][()] K[249].dot(K[:,251]).A # gives -9.61216512e+08 np.dot(K[249].A,K[:,251].A # gives -9.61150976e+08 I located the diverging cells by using the equations behind np.allclose(): def close(a, b, rtol=1e-05, atol=1e-08): c = np.absolute(a-b) <= (atol + rtol * np.absolute(b)) d = np.absolute(b-a) <= (atol + rtol * np.absolute(a)) return c | d c = close(K.dot(K.T).A, np.dot(K.A, K.T.A)) c = ~c c.nonzero() This gives pairs of 28 indices which diverge. Or to get all diverging cells: K.dot(K.T).A[c] np.dot(K.A, K.T.A)[c] # or the derived differences: K.dot(K.T).A[c] - np.dot(K.A, K.T.A)[c] # average of them: 541257 Does anyone have an idea what's going on here? Regards, Sebastian _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user sparsecsc.npz (26K) Download Attachment |
On Fri, 27 Sep 2013 07:12:06 +0000
Wagner Sebastian <[hidden email]> wrote: > When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply. Floating point addition is not commutable ... the order you are summing is important (in silico) while it is not on the paper. -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On 09/27/2013 09:31 AM, Jerome Kieffer wrote:
> On Fri, 27 Sep 2013 07:12:06 +0000 > Wagner Sebastian <[hidden email]> wrote: > >> When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply. > Floating point addition is not commutable ... the order you are summing > is important (in silico) while it is not on the paper. > If that is the problem, this is a detailed description: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Best, Emanuele _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Wagner Sebastian
So, I can't get rid of this issue, because this error does not lie within scipy or numpy but in the design of floating point arithmetic?
At the end of the calculation (after 3 more operations) only two promille out of 1470 values do not differ, all others do differ from the third digit on (median of the difference). Is there any chance to know which one of the two results is "more correct"? Regards, Sebastian -----Ursprüngliche Nachricht----- Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Emanuele Olivetti Gesendet: Freitag, 27. September 2013 09:37 An: [hidden email] Betreff: Re: [SciPy-User] Sparse dot differs from numpy dot by 10^5 On 09/27/2013 09:31 AM, Jerome Kieffer wrote: > On Fri, 27 Sep 2013 07:12:06 +0000 > Wagner Sebastian <[hidden email]> wrote: > >> When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply. > Floating point addition is not commutable ... the order you are > summing is important (in silico) while it is not on the paper. > If that is the problem, this is a detailed description: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html Best, Emanuele _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Fri, 27 Sep 2013 08:21:44 +0000
Wagner Sebastian <[hidden email]> wrote: > So, I can't get rid of this issue, because this error does not lie within scipy or numpy but in the design of floating point arithmetic? > At the end of the calculation (after 3 more operations) only two promille out of 1470 values do not differ, all others do differ from the third digit on (median of the difference). > > Is there any chance to know which one of the two results is "more correct"? Perform the operation using Kahan summation (see the article, or wikipedia). It is "more" correct. Cheers, -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |