Hi,
I'm trying to calculate correlation coefficients and looking at the np.corrcoef function. It has bias and ddof arguments, however when I try different values of ddof with test data the results are always the same, i.e., changing ddof has no effect. From some back-of-the-envelope algebra I reckon the n/(n-ddof) normalisations should get cancelled out when calculating correlation coefficients from a covariance matrix, and therefore the ddof (and bias) arguments to np.corrcoef are redundant. I'd be very grateful if someone could verify this is true or tell me if I've missed something. Thanks, Alistair -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: [hidden email] Tel: +44 (0)1865 287721 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
It does change for me, though very little.... x = np.random.randn(50) y = x * x * x * x for ddof in range(20): print "ddof = {}; r = {:.20f}".format(ddof, np.corrcoef(x, y, ddof=ddof)[0, 1]) ddof = 0; r = 0.27115960925626320099 ddof = 1; r = 0.27115960925626320099 ddof = 2; r = 0.27115960925626314548 ddof = 3; r = 0.27115960925626320099 ddof = 4; r = 0.27115960925626320099 ddof = 5; r = 0.27115960925626314548 ddof = 6; r = 0.27115960925626320099 ddof = 7; r = 0.27115960925626320099 ddof = 8; r = 0.27115960925626320099 ddof = 9; r = 0.27115960925626320099 ddof = 10; r = 0.27115960925626314548 ddof = 11; r = 0.27115960925626320099 ddof = 12; r = 0.27115960925626320099 ddof = 13; r = 0.27115960925626320099 ddof = 14; r = 0.27115960925626314548 ddof = 15; r = 0.27115960925626314548 ddof = 16; r = 0.27115960925626314548 ddof = 17; r = 0.27115960925626320099 ddof = 18; r = 0.27115960925626320099 ddof = 19; r = 0.27115960925626320099 Cheers 2015-03-10 11:55 GMT-04:00 Alistair Miles <[hidden email]>:
Sasha _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Alistair Miles
Alistair Miles <[hidden email]> wrote:
> I'm trying to calculate correlation coefficients and looking at the > np.corrcoef function. It has bias and ddof arguments, however when I try > different values of ddof with test data the results are always the same, > i.e., changing ddof has no effect. From some back-of-the-envelope algebra I > reckon the n/(n-ddof) normalisations should get cancelled out when > calculating correlation coefficients from a covariance matrix, and > therefore the ddof (and bias) arguments to np.corrcoef are redundant. > > I'd be very grateful if someone could verify this is true or tell me if > I've missed something. You are right. It should cancel out or np.corrcoef would be wrong. The sample size does not go into the Pearson product-moment correlation. Sturla > Thanks, > Alistair > > -- > Alistair Miles > Head of Epidemiological Informatics > Centre for Genomics and Global Health <<a href="http://cggh.org">http://cggh.org</a>> > The Wellcome Trust Centre for Human Genetics > Roosevelt Drive > Oxford > OX3 7BN > United Kingdom > Web: <a href="http://purl.org/net/aliman">http://purl.org/net/aliman</a> > Email: [hidden email] > Tel: +44 (0)1865 287721 > > _______________________________________________ > SciPy-User mailing list > [hidden email] > <a > href="http://mail.scipy.org/mailman/listinfo/scipy-user">http://mail.scipy.org/mailman/listinfo/scipy-user</a> _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Oleksandr Huziy
Oleksandr Huziy <[hidden email]> wrote:
> It does change for me, though very little.... Probably rounding error. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Sturla Molden-3
Hi,
On Tue, Mar 10, 2015 at 9:27 AM, Sturla Molden <[hidden email]> wrote: > Alistair Miles <[hidden email]> wrote: > >> I'm trying to calculate correlation coefficients and looking at the >> np.corrcoef function. It has bias and ddof arguments, however when I try >> different values of ddof with test data the results are always the same, >> i.e., changing ddof has no effect. From some back-of-the-envelope algebra I >> reckon the n/(n-ddof) normalisations should get cancelled out when >> calculating correlation coefficients from a covariance matrix, and >> therefore the ddof (and bias) arguments to np.corrcoef are redundant. >> >> I'd be very grateful if someone could verify this is true or tell me if >> I've missed something. > > You are right. It should cancel out or np.corrcoef would be wrong. The > sample size does not go into the Pearson product-moment correlation. Oh dear - that's embarrassing. https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient I guess we should deprecate the 'bias' and 'ddof' input arguments asap. Cheers, Matthew _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Thanks for the responses, glad to know I'm not going crazy.
Cheers, Alistair. On Tuesday, 10 March 2015, Matthew Brett <[hidden email]> wrote: Hi, -- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health <http://cggh.org> The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Web: http://purl.org/net/aliman Email: [hidden email] Tel: +44 (0)1865 287721 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Matthew Brett
On 10/03/15 21:12, Matthew Brett wrote:
> Oh dear - that's embarrassing. > > https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient > > I guess we should deprecate the 'bias' and 'ddof' input arguments asap. It is an unfortunate consequence of implementing np.corrcoef on top of np.cov. np.corrcoef should not be computed with np.cov because it just adds additional rounding error to the result. https://github.com/numpy/numpy/blob/32e23a1d52a05d3a56f693010eaf8d96826db75f/numpy/lib/function_base.py Sturla _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Mar 10, 2015 at 7:21 PM, Sturla Molden <[hidden email]> wrote:
> On 10/03/15 21:12, Matthew Brett wrote: > >> Oh dear - that's embarrassing. >> >> https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient >> >> I guess we should deprecate the 'bias' and 'ddof' input arguments asap. > > It is an unfortunate consequence of implementing np.corrcoef on top of > np.cov. Except we should have realized that bias / ddof cancels and therefore should not have implemented the bias / ddof input arguments (or passed them to cov in the function). > np.corrcoef should not be computed with np.cov because it just adds > additional rounding error to the result. What algorithm do you think we should use to minimize rounding error? Cheers, Matthew _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On 11/03/15 03:56, Matthew Brett wrote:
>> np.corrcoef should not be computed with np.cov because it just adds >> additional rounding error to the result. > > What algorithm do you think we should use to minimize rounding error? I was not actually thinking about that. I just thought we could reuse some of the code from np.cov to avoid the redundant division and multiplications. But since you asked, to minimize rounding error there is a two-pass method which can be used for both cov and corrcoef. Cf. this Matlab code: http://home.online.no/~pjacklam/matlab/software/util/statutil/covmat.m This would be very easy to use in NumPy. Another method which is less known is to use the SVD. It can also be used to compute the corrcoef. Here for real values and rowvar=False: def cov(X, ddof): nx,p = X.shape mean = X.mean(axis=0) CX = X - mean[None,:] u,s,pc = np.linalg.svd(CX/np.sqrt(nx-ddof), full_matrices=False) s2 = s**2 tmp = np.eye(p) * s2[:,None] return np.dot(pc.T,np.dot(tmp,pc)) Sturla _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Matthew Brett
Hi,
On Tue, Mar 10, 2015 at 1:12 PM, Matthew Brett <[hidden email]> wrote: > Hi, > > On Tue, Mar 10, 2015 at 9:27 AM, Sturla Molden <[hidden email]> wrote: >> Alistair Miles <[hidden email]> wrote: >> >>> I'm trying to calculate correlation coefficients and looking at the >>> np.corrcoef function. It has bias and ddof arguments, however when I try >>> different values of ddof with test data the results are always the same, >>> i.e., changing ddof has no effect. From some back-of-the-envelope algebra I >>> reckon the n/(n-ddof) normalisations should get cancelled out when >>> calculating correlation coefficients from a covariance matrix, and >>> therefore the ddof (and bias) arguments to np.corrcoef are redundant. >>> >>> I'd be very grateful if someone could verify this is true or tell me if >>> I've missed something. >> >> You are right. It should cancel out or np.corrcoef would be wrong. The >> sample size does not go into the Pearson product-moment correlation. > > Oh dear - that's embarrassing. > > https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient > > I guess we should deprecate the 'bias' and 'ddof' input arguments asap. https://github.com/numpy/numpy/pull/5675 Cheers, Matthew _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |