Dear Scipy Community, I am looking at the scipy.stats.linregress code and see: # average sum of squares: ssxm, ssxym, ssyxm, ssym = np.cov(x, y, bias=1).flat r_num = ssxym r_den = np.sqrt(ssxm*ssym) if r_den == 0.0: r = 0.0 else: r = r_num / r_den if (r > 1.0): r = 1.0 # from numerical error if the slope is negative, the correlation factor R is -1, not one so one should add: if (r < -1.0): r = -1.0 # from numerical error or did I completely mis-understood the code ? Cheers, -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
This parameter is R**2, the square of R. You are computing it using the squares of the residuals, so everything should be positive. If you get r negative, something has gone terribly wrong. On Feb 12, 2013 8:59 AM, "Jerome Kieffer" <[hidden email]> wrote:
_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, 12 Feb 2013 11:55:31 +0100
Daπid <[hidden email]> wrote: > This parameter is R**2, the square of R. You are computing it using the > squares of the residuals, so everything should be positive. If you get r > negative, something has gone terribly wrong. In [3]: scipy.stats.linregress([1,2,3],[3,2,1]) Out[3]: (-1.0, 4.0, -1.0, 9.0031759825137669e-11, 0.0) I am not very confident in my knowledge in statistics but here R = -1 and it does not look like an error. I implemented a method to perform thousands of linear regression and few of them returned -1.00001 (or so) which later gave NaN as stderr. so either the test should be: if (r*r > 1.0): r = r/abs(r) or: if (r > 1.0): r = 1 elif (r < -1.0): r = -1 Cheers, -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Feb 12, 2013 at 7:19 AM, Jerome Kieffer <[hidden email]> wrote:
> On Tue, 12 Feb 2013 11:55:31 +0100 > Daπid <[hidden email]> wrote: > >> This parameter is R**2, the square of R. You are computing it using the >> squares of the residuals, so everything should be positive. If you get r >> negative, something has gone terribly wrong. > > In [3]: scipy.stats.linregress([1,2,3],[3,2,1]) > Out[3]: (-1.0, 4.0, -1.0, 9.0031759825137669e-11, 0.0) > > I am not very confident in my knowledge in statistics but here R = -1 and it does not look like an error. > I implemented a method to perform thousands of linear regression and few of them returned -1.00001 (or so) which later gave NaN as stderr. > > so either the test should be: > if (r*r > 1.0): r = r/abs(r) > or: > if (r > 1.0): r = 1 > elif (r < -1.0): r = -1 The correlation coefficient that is reported is the signed correlation, the docstring has an example that takes the square to get R_squared. I don't know whether it's the R_squared, I need to check. But because r is signed we should have the check r<-1 then -1 Josef > > Cheers, > > -- > Jérôme Kieffer > On-Line Data analysis / Software Group > ISDD / ESRF > tel +33 476 882 445 > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Jerome Kieffer
On 12 February 2013 13:19, Jerome Kieffer <[hidden email]> wrote:
> I am not very confident in my knowledge in statistics but here R = -1 and it does not look like an error. > I implemented a method to perform thousands of linear regression and few of them returned -1.00001 (or so) which later gave NaN as stderr. You are absolutely right, my mistake. I got the formula confused in my head, I should have looked it up. Indeed, the covariance can be negative, giving you the r<0. David. _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by josef.pktd
On Tue, Feb 12, 2013 at 7:37 AM, <[hidden email]> wrote:
> On Tue, Feb 12, 2013 at 7:19 AM, Jerome Kieffer <[hidden email]> wrote: >> On Tue, 12 Feb 2013 11:55:31 +0100 >> Daπid <[hidden email]> wrote: >> >>> This parameter is R**2, the square of R. You are computing it using the >>> squares of the residuals, so everything should be positive. If you get r >>> negative, something has gone terribly wrong. >> >> In [3]: scipy.stats.linregress([1,2,3],[3,2,1]) >> Out[3]: (-1.0, 4.0, -1.0, 9.0031759825137669e-11, 0.0) >> >> I am not very confident in my knowledge in statistics but here R = -1 and it does not look like an error. >> I implemented a method to perform thousands of linear regression and few of them returned -1.00001 (or so) which later gave NaN as stderr. >> >> so either the test should be: >> if (r*r > 1.0): r = r/abs(r) >> or: >> if (r > 1.0): r = 1 >> elif (r < -1.0): r = -1 > > The correlation coefficient that is reported is the signed > correlation, the docstring has an example that takes the square to get > R_squared. > > I don't know whether it's the R_squared, I need to check. >>> sm.OLS([1,2,3],sm.add_constant([3.1,2,1])).fit().rsquared 0.99924471299093653 >>> stats.linregress([3.1,2,1], [1,2,3])[2] -0.99962228516121865 >>> stats.linregress([3.1,2,1], [1,2,3])[2]**2 0.99924471299093676 Josef Josef > > But because r is signed we should have the check r<-1 then -1 > > Josef > >> >> Cheers, >> >> -- >> Jérôme Kieffer >> On-Line Data analysis / Software Group >> ISDD / ESRF >> tel +33 476 882 445 >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Feb 12, 2013 at 7:46 AM, <[hidden email]> wrote:
> On Tue, Feb 12, 2013 at 7:37 AM, <[hidden email]> wrote: >> On Tue, Feb 12, 2013 at 7:19 AM, Jerome Kieffer <[hidden email]> wrote: >>> On Tue, 12 Feb 2013 11:55:31 +0100 >>> Daπid <[hidden email]> wrote: >>> >>>> This parameter is R**2, the square of R. You are computing it using the >>>> squares of the residuals, so everything should be positive. If you get r >>>> negative, something has gone terribly wrong. >>> >>> In [3]: scipy.stats.linregress([1,2,3],[3,2,1]) >>> Out[3]: (-1.0, 4.0, -1.0, 9.0031759825137669e-11, 0.0) >>> >>> I am not very confident in my knowledge in statistics but here R = -1 and it does not look like an error. >>> I implemented a method to perform thousands of linear regression and few of them returned -1.00001 (or so) which later gave NaN as stderr. >>> >>> so either the test should be: >>> if (r*r > 1.0): r = r/abs(r) >>> or: >>> if (r > 1.0): r = 1 >>> elif (r < -1.0): r = -1 >> >> The correlation coefficient that is reported is the signed >> correlation, the docstring has an example that takes the square to get >> R_squared. >> >> I don't know whether it's the R_squared, I need to check. > >>>> sm.OLS([1,2,3],sm.add_constant([3.1,2,1])).fit().rsquared > 0.99924471299093653 > >>>> stats.linregress([3.1,2,1], [1,2,3])[2] > -0.99962228516121865 >>>> stats.linregress([3.1,2,1], [1,2,3])[2]**2 > 0.99924471299093676 > > Josef > Josef >> >> But because r is signed we should have the check r<-1 then -1 Volunteers for a pull request ? and for checking that the test suite has a case with negative r, so this doesn't get changed by accident. Josef >> >> Josef >> >>> >>> Cheers, >>> >>> -- >>> Jérôme Kieffer >>> On-Line Data analysis / Software Group >>> ISDD / ESRF >>> tel +33 476 882 445 >>> _______________________________________________ >>> SciPy-User mailing list >>> [hidden email] >>> http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, 12 Feb 2013 07:54:38 -0500
[hidden email] wrote: > Volunteers for a pull request ? > and for checking that the test suite has a case with negative r, so > this doesn't get changed by accident. I can ... but I am not used to the tests in scipy -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Feb 12, 2013 at 10:33 AM, Jerome Kieffer <[hidden email]> wrote:
> On Tue, 12 Feb 2013 07:54:38 -0500 > [hidden email] wrote: > >> Volunteers for a pull request ? >> and for checking that the test suite has a case with negative r, so >> this doesn't get changed by accident. > > I can ... but I am not used to the tests in scipy Usually I just have to search the test folder by function. most of the linregress tests are from an old benchmark test suite, checking mainly difficult cases. This https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_stats.py#L736 is a test that I added. Similar we could add one with a negative coefficient. statsmodels OLS is verified against other packages, so we can get the correct test results from there. Thank you, Josef > > > -- > Jérôme Kieffer > On-Line Data analysis / Software Group > ISDD / ESRF > tel +33 476 882 445 > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, 12 Feb 2013 10:48:33 -0500
[hidden email] wrote: > On Tue, Feb 12, 2013 at 10:33 AM, Jerome Kieffer <[hidden email]> wrote: > > On Tue, 12 Feb 2013 07:54:38 -0500 > > [hidden email] wrote: > > > >> Volunteers for a pull request ? > >> and for checking that the test suite has a case with negative r, so > >> this doesn't get changed by accident. > > > > I can ... but I am not used to the tests in scipy > > Usually I just have to search the test folder by function. > the patch for the code is trivial... but get a valid test-case (currently breaking) is not trivial, especially that numpy.cov is casting to double precision (I have seen R<-1 only with single precision arithmetic) I am preparing a pull-request. Cheers, -- Jérôme Kieffer On-Line Data analysis / Software Group ISDD / ESRF tel +33 476 882 445 _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Tue, Feb 12, 2013 at 11:45 AM, Jerome Kieffer <[hidden email]> wrote:
> On Tue, 12 Feb 2013 10:48:33 -0500 > [hidden email] wrote: > >> On Tue, Feb 12, 2013 at 10:33 AM, Jerome Kieffer <[hidden email]> wrote: >> > On Tue, 12 Feb 2013 07:54:38 -0500 >> > [hidden email] wrote: >> > >> >> Volunteers for a pull request ? >> >> and for checking that the test suite has a case with negative r, so >> >> this doesn't get changed by accident. >> > >> > I can ... but I am not used to the tests in scipy >> >> Usually I just have to search the test folder by function. >> > > the patch for the code is trivial... > but get a valid test-case (currently breaking) is not trivial, especially that numpy.cov is casting to double We should have a test for negative r, as backwards compatibility check for the future, but if it's too difficult to get one with r<-1, then we can leave two lines untested. I don't remember if I have ever seen a case with abs(r)>1. Josef > > I am preparing a pull-request. > > Cheers, > -- > Jérôme Kieffer > On-Line Data analysis / Software Group > ISDD / ESRF > tel +33 476 882 445 > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |