[SciPy-User] Sparse dot differs from numpy dot by 10^5

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Sparse dot differs from numpy dot by 10^5

Wagner Sebastian
Dear Scipy-Users,

When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply.

The data-source is a square matrix with N=1470, which is originally created by the constructor of scipy.sparse.csc_matrix from a dense matrix (You can recreate it by converting the matrix first to dense and then back to csc). I compute the dot-product of the matrix with itself transposed and some of the dot products (28 exactly) are different from the np.dot by a factor of 10^5.
I saved the csc-matrix to a npz-file and attached it. A code sample to reproduce the effect is here:

sparsecsc = np.load('sparsecsc.npz')
K = sparsecsc['K'][()]
K[249].dot(K[:,251]).A # gives -9.61216512e+08
np.dot(K[249].A,K[:,251].A # gives -9.61150976e+08

I located the diverging cells by using the equations behind np.allclose():
def close(a, b, rtol=1e-05, atol=1e-08):
    c = np.absolute(a-b) <= (atol + rtol * np.absolute(b))
    d = np.absolute(b-a) <= (atol + rtol * np.absolute(a))
    return c | d
c = close(K.dot(K.T).A, np.dot(K.A, K.T.A))
c = ~c
c.nonzero()

This gives pairs of 28 indices which diverge. Or to get all diverging cells:

K.dot(K.T).A[c]
np.dot(K.A, K.T.A)[c]
# or the derived differences:
K.dot(K.T).A[c] - np.dot(K.A, K.T.A)[c] # average of them: 541257

Does anyone have an idea what's going on here?

Regards,
Sebastian

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user

sparsecsc.npz (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Sparse dot differs from numpy dot by 10^5

Jerome Kieffer
On Fri, 27 Sep 2013 07:12:06 +0000
Wagner Sebastian <[hidden email]> wrote:

> When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply.

Floating point addition is not commutable ... the order you are summing
is important (in silico) while it is not on the paper.

--
Jérôme Kieffer
On-Line Data analysis / Software Group
ISDD / ESRF
tel +33 476 882 445
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Sparse dot differs from numpy dot by 10^5

Emanuele Olivetti-3
On 09/27/2013 09:31 AM, Jerome Kieffer wrote:
> On Fri, 27 Sep 2013 07:12:06 +0000
> Wagner Sebastian <[hidden email]> wrote:
>
>> When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply.
> Floating point addition is not commutable ... the order you are summing
> is important (in silico) while it is not on the paper.
>

If that is the problem, this is a detailed description:
  http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Best,

Emanuele

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Sparse dot differs from numpy dot by 10^5

Wagner Sebastian
In reply to this post by Wagner Sebastian
So, I can't get rid of this issue, because this error does not lie within scipy or numpy but in the design of floating point arithmetic?
At the end of the calculation (after 3 more operations) only two promille out of 1470 values do not differ, all others do differ from the third digit on (median of the difference).

Is there any chance to know which one of the two results is "more correct"?

Regards,
Sebastian

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]] Im Auftrag von Emanuele Olivetti
Gesendet: Freitag, 27. September 2013 09:37
An: [hidden email]
Betreff: Re: [SciPy-User] Sparse dot differs from numpy dot by 10^5

On 09/27/2013 09:31 AM, Jerome Kieffer wrote:
> On Fri, 27 Sep 2013 07:12:06 +0000
> Wagner Sebastian <[hidden email]> wrote:
>
>> When changing some algorithms from dense matrices to sparse matrices I stumbled over a discrepancy between the sparse-version of dot and numpy's dot. All open issues mentioning 'dot' do not apply.
> Floating point addition is not commutable ... the order you are
> summing is important (in silico) while it is not on the paper.
>

If that is the problem, this is a detailed description:
  http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Best,

Emanuele

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Sparse dot differs from numpy dot by 10^5

Jerome Kieffer
On Fri, 27 Sep 2013 08:21:44 +0000
Wagner Sebastian <[hidden email]> wrote:

> So, I can't get rid of this issue, because this error does not lie within scipy or numpy but in the design of floating point arithmetic?
> At the end of the calculation (after 3 more operations) only two promille out of 1470 values do not differ, all others do differ from the third digit on (median of the difference).
>
> Is there any chance to know which one of the two results is "more correct"?

Perform the operation using Kahan summation (see the article, or wikipedia).
It is "more" correct.
Cheers,

--
Jérôme Kieffer
On-Line Data analysis / Software Group
ISDD / ESRF
tel +33 476 882 445
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user