[SciPy-User] Least squares speed

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] Least squares speed

Nathaniel Smith
Here are several ways to solve the least squares problem XB = Y:

scipy.linalg.lstsq(x, y)
np.linalg.lstsq(x, y)
np.dot(scipy.linalg.pinv(x), y)
np.dot(np.linalg.pinv(x), y)

>From the documentation, I would expect these to be ordered by speed,
fastest up top and slowest at the bottom. It turns out they *are*
ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
very substantial differences (more than a factor of 20!):

# Typical smallish problem for me:
In [40]: x = np.random.randn(780, 2)

In [41]: y = np.random.randn(780, 32 * 275)

In [42]: %timeit scipy.linalg.lstsq(x, y)
1 loops, best of 3: 494 ms per loop

In [43]: %timeit np.linalg.lstsq(x, y)
1 loops, best of 3: 356 ms per loop

In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
10 loops, best of 3: 62.2 ms per loop

In [45]: %timeit np.dot(np.linalg.pinv(x), y)
10 loops, best of 3: 23.2 ms per loop

Is this expected? I'm particularly confused at why scipy's lstsq
should be almost 40% slower than numpy's. (And shouldn't we have a
fast one-function way to solve least squares problems?)

Any reason not to use the last option? Is it as numerically stable as lstsq?

Is there any other version that's even faster or even more numerically stable?

-n
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

josef.pktd
On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <[hidden email]> wrote:

> Here are several ways to solve the least squares problem XB = Y:
>
> scipy.linalg.lstsq(x, y)
> np.linalg.lstsq(x, y)
> np.dot(scipy.linalg.pinv(x), y)
> np.dot(np.linalg.pinv(x), y)
>
> >From the documentation, I would expect these to be ordered by speed,
> fastest up top and slowest at the bottom. It turns out they *are*
> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
> very substantial differences (more than a factor of 20!):
>
> # Typical smallish problem for me:
> In [40]: x = np.random.randn(780, 2)
>
> In [41]: y = np.random.randn(780, 32 * 275)
>
> In [42]: %timeit scipy.linalg.lstsq(x, y)
> 1 loops, best of 3: 494 ms per loop
>
> In [43]: %timeit np.linalg.lstsq(x, y)
> 1 loops, best of 3: 356 ms per loop
>
> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
> 10 loops, best of 3: 62.2 ms per loop
>
> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
> 10 loops, best of 3: 23.2 ms per loop
>
> Is this expected? I'm particularly confused at why scipy's lstsq
> should be almost 40% slower than numpy's. (And shouldn't we have a
> fast one-function way to solve least squares problems?)
>
> Any reason not to use the last option? Is it as numerically stable as lstsq?
>
> Is there any other version that's even faster or even more numerically stable?

If you have very long x, then using normal equation is faster for univariate y.

There are many different ways to calculate pinv
https://github.com/scipy/scipy/pull/289

np.lstsq breaks on rank deficient x IIRC, uses different Lapack
functions than scipy's lstsq

In very badly scaled cases (worst NIST case), scipy's pinv was a bit
more accurate than numpy's, but maybe just different defaults.
numpy's pinv was also faster than scipy's in the cases that I tried.

There is only a single NIST case that can fail using the defaults with
numpy pinv.

(what about using qr or chol_solve ?)

Lots of different ways to solve this and I never figured out a ranking
across different cases, speed, precision, robustness to
near-singularity.

Josef


>
> -n
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

Pauli Virtanen-3
In reply to this post by Nathaniel Smith
02.10.2013 00:24, Nathaniel Smith kirjoitti:
[clip]
> Is this expected? I'm particularly confused at why scipy's lstsq
> should be almost 40% slower than numpy's. (And shouldn't we have a
> fast one-function way to solve least squares problems?)

It checks if there are nans in the array.

Also, it uses xGELSS, whereas the numpy one uses xGELSD, so also the
algorithm called in LAPACK is different.

There's probably no reason why they should be different.

--
Pauli Virtanen

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

josef.pktd
In reply to this post by josef.pktd
On Tue, Oct 1, 2013 at 5:45 PM,  <[hidden email]> wrote:

> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <[hidden email]> wrote:
>> Here are several ways to solve the least squares problem XB = Y:
>>
>> scipy.linalg.lstsq(x, y)
>> np.linalg.lstsq(x, y)
>> np.dot(scipy.linalg.pinv(x), y)
>> np.dot(np.linalg.pinv(x), y)
>>
>> >From the documentation, I would expect these to be ordered by speed,
>> fastest up top and slowest at the bottom. It turns out they *are*
>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>> very substantial differences (more than a factor of 20!):
>>
>> # Typical smallish problem for me:
>> In [40]: x = np.random.randn(780, 2)
>>
>> In [41]: y = np.random.randn(780, 32 * 275)
>>
>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>> 1 loops, best of 3: 494 ms per loop
>>
>> In [43]: %timeit np.linalg.lstsq(x, y)
>> 1 loops, best of 3: 356 ms per loop
>>
>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>> 10 loops, best of 3: 62.2 ms per loop
>>
>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>> 10 loops, best of 3: 23.2 ms per loop
>>
>> Is this expected? I'm particularly confused at why scipy's lstsq
>> should be almost 40% slower than numpy's. (And shouldn't we have a
>> fast one-function way to solve least squares problems?)
>>
>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>
>> Is there any other version that's even faster or even more numerically stable?
>
> If you have very long x, then using normal equation is faster for univariate y.
>
> There are many different ways to calculate pinv
> https://github.com/scipy/scipy/pull/289
>
> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
> functions than scipy's lstsq
>
> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
> more accurate than numpy's, but maybe just different defaults.
> numpy's pinv was also faster than scipy's in the cases that I tried.
>
> There is only a single NIST case that can fail using the defaults with
> numpy pinv.


for example, when I was playing with precision problems
Addendum 2: scipy.linalg versus numpy.linalg
http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html

Josef

>
> (what about using qr or chol_solve ?)
>
> Lots of different ways to solve this and I never figured out a ranking
> across different cases, speed, precision, robustness to
> near-singularity.
>
> Josef
>
>
>>
>> -n
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

Nathaniel Smith
In reply to this post by josef.pktd
On Tue, Oct 1, 2013 at 10:45 PM,  <[hidden email]> wrote:

> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <[hidden email]> wrote:
>> Here are several ways to solve the least squares problem XB = Y:
>>
>> scipy.linalg.lstsq(x, y)
>> np.linalg.lstsq(x, y)
>> np.dot(scipy.linalg.pinv(x), y)
>> np.dot(np.linalg.pinv(x), y)
>>
>> >From the documentation, I would expect these to be ordered by speed,
>> fastest up top and slowest at the bottom. It turns out they *are*
>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>> very substantial differences (more than a factor of 20!):
>>
>> # Typical smallish problem for me:
>> In [40]: x = np.random.randn(780, 2)
>>
>> In [41]: y = np.random.randn(780, 32 * 275)
>>
>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>> 1 loops, best of 3: 494 ms per loop
>>
>> In [43]: %timeit np.linalg.lstsq(x, y)
>> 1 loops, best of 3: 356 ms per loop
>>
>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>> 10 loops, best of 3: 62.2 ms per loop
>>
>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>> 10 loops, best of 3: 23.2 ms per loop
>>
>> Is this expected? I'm particularly confused at why scipy's lstsq
>> should be almost 40% slower than numpy's. (And shouldn't we have a
>> fast one-function way to solve least squares problems?)
>>
>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>
>> Is there any other version that's even faster or even more numerically stable?
>
> If you have very long x, then using normal equation is faster for univariate y.
>
> There are many different ways to calculate pinv
> https://github.com/scipy/scipy/pull/289
>
> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
> functions than scipy's lstsq
>
> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
> more accurate than numpy's, but maybe just different defaults.
> numpy's pinv was also faster than scipy's in the cases that I tried.
>
> There is only a single NIST case that can fail using the defaults with
> numpy pinv.
>
> (what about using qr or chol_solve ?)
>
> Lots of different ways to solve this and I never figured out a ranking
> across different cases, speed, precision, robustness to
> near-singularity.

Amazingly enough, in this case doing a full rank-revealing QR,
including calculating the rank via condition number of submatrices, is
tied for the fastest method:

In [66]: %timeit q, r, p = scipy.linalg.qr(x, mode="economic",
pivoting=True); [np.linalg.cond(r[:i, :i]) for i in xrange(1,
r.shape[0])]; np.linalg.solve(r[:, p], np.dot(q.T, y))
10 loops, best of 3: 22 ms per loop

Also tied for fastest with np.linalg.pinv (above) are the direct
method, cho_solve, and a non-rank-revealing QR with lower-triangular
backsolve:

In [70]: %timeit np.linalg.solve(np.dot(x.T, x), np.dot(x.T, y))
10 loops, best of 3: 21.4 ms per loop

In [71]: %timeit c = scipy.linalg.cho_factor(np.dot(x.T, x));
scipy.linalg.cho_solve(c, np.dot(x.T, y))
10 loops, best of 3: 21.2 ms per loop

In [72]: %timeit q, r = scipy.linalg.qr(x, mode="economic");
scipy.linalg.solve_triangular(r, np.dot(q.T, y))
10 loops, best of 3: 22.4 ms per loop

But AFAIK QR is the gold standard for precision on the kinds of linear
regression problems I care about (and definitely has better numerical
stability than the versions that involve dot(x.T, x)), and in my case
I actually want to detect and reject ill-conditioned problems rather
than impose some secondary disambiguating constraint, so... RRQR seems
to be the obvious choice.

Not sure if there's anything that could or should be done to make
these trade-offs more obvious to people less obsessive than me...

-n
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

Matthew Brett
Hi,

On Wed, Oct 2, 2013 at 12:58 PM, Nathaniel Smith <[hidden email]> wrote:

> On Tue, Oct 1, 2013 at 10:45 PM,  <[hidden email]> wrote:
>> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <[hidden email]> wrote:
>>> Here are several ways to solve the least squares problem XB = Y:
>>>
>>> scipy.linalg.lstsq(x, y)
>>> np.linalg.lstsq(x, y)
>>> np.dot(scipy.linalg.pinv(x), y)
>>> np.dot(np.linalg.pinv(x), y)
>>>
>>> >From the documentation, I would expect these to be ordered by speed,
>>> fastest up top and slowest at the bottom. It turns out they *are*
>>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>>> very substantial differences (more than a factor of 20!):
>>>
>>> # Typical smallish problem for me:
>>> In [40]: x = np.random.randn(780, 2)
>>>
>>> In [41]: y = np.random.randn(780, 32 * 275)
>>>
>>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>>> 1 loops, best of 3: 494 ms per loop
>>>
>>> In [43]: %timeit np.linalg.lstsq(x, y)
>>> 1 loops, best of 3: 356 ms per loop
>>>
>>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>>> 10 loops, best of 3: 62.2 ms per loop
>>>
>>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>>> 10 loops, best of 3: 23.2 ms per loop
>>>
>>> Is this expected? I'm particularly confused at why scipy's lstsq
>>> should be almost 40% slower than numpy's. (And shouldn't we have a
>>> fast one-function way to solve least squares problems?)
>>>
>>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>>
>>> Is there any other version that's even faster or even more numerically stable?
>>
>> If you have very long x, then using normal equation is faster for univariate y.
>>
>> There are many different ways to calculate pinv
>> https://github.com/scipy/scipy/pull/289
>>
>> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
>> functions than scipy's lstsq
>>
>> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
>> more accurate than numpy's, but maybe just different defaults.
>> numpy's pinv was also faster than scipy's in the cases that I tried.
>>
>> There is only a single NIST case that can fail using the defaults with
>> numpy pinv.
>>
>> (what about using qr or chol_solve ?)
>>
>> Lots of different ways to solve this and I never figured out a ranking
>> across different cases, speed, precision, robustness to
>> near-singularity.
>
> Amazingly enough, in this case doing a full rank-revealing QR,
> including calculating the rank via condition number of submatrices, is
> tied for the fastest method:
>
> In [66]: %timeit q, r, p = scipy.linalg.qr(x, mode="economic",
> pivoting=True); [np.linalg.cond(r[:i, :i]) for i in xrange(1,
> r.shape[0])]; np.linalg.solve(r[:, p], np.dot(q.T, y))
> 10 loops, best of 3: 22 ms per loop
>
> Also tied for fastest with np.linalg.pinv (above) are the direct
> method, cho_solve, and a non-rank-revealing QR with lower-triangular
> backsolve:
>
> In [70]: %timeit np.linalg.solve(np.dot(x.T, x), np.dot(x.T, y))
> 10 loops, best of 3: 21.4 ms per loop
>
> In [71]: %timeit c = scipy.linalg.cho_factor(np.dot(x.T, x));
> scipy.linalg.cho_solve(c, np.dot(x.T, y))
> 10 loops, best of 3: 21.2 ms per loop
>
> In [72]: %timeit q, r = scipy.linalg.qr(x, mode="economic");
> scipy.linalg.solve_triangular(r, np.dot(q.T, y))
> 10 loops, best of 3: 22.4 ms per loop
>
> But AFAIK QR is the gold standard for precision on the kinds of linear
> regression problems I care about (and definitely has better numerical
> stability than the versions that involve dot(x.T, x)), and in my case
> I actually want to detect and reject ill-conditioned problems rather
> than impose some secondary disambiguating constraint, so... RRQR seems
> to be the obvious choice.
>
> Not sure if there's anything that could or should be done to make
> these trade-offs more obvious to people less obsessive than me...

How about a tutorial page or a notebook?  I am quite sure that there
will be an audience for that level of obsessive :).  Me on a day I'm
trying to do a lot of modeling for example.

Cheers,

Matthew
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

Charles R Harris
In reply to this post by Nathaniel Smith



On Wed, Oct 2, 2013 at 1:58 PM, Nathaniel Smith <[hidden email]> wrote:
On Tue, Oct 1, 2013 at 10:45 PM,  <[hidden email]> wrote:
> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <[hidden email]> wrote:
>> Here are several ways to solve the least squares problem XB = Y:
>>
>> scipy.linalg.lstsq(x, y)
>> np.linalg.lstsq(x, y)
>> np.dot(scipy.linalg.pinv(x), y)
>> np.dot(np.linalg.pinv(x), y)
>>
>> >From the documentation, I would expect these to be ordered by speed,
>> fastest up top and slowest at the bottom. It turns out they *are*
>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>> very substantial differences (more than a factor of 20!):
>>
>> # Typical smallish problem for me:
>> In [40]: x = np.random.randn(780, 2)
>>
>> In [41]: y = np.random.randn(780, 32 * 275)
>>
>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>> 1 loops, best of 3: 494 ms per loop
>>
>> In [43]: %timeit np.linalg.lstsq(x, y)
>> 1 loops, best of 3: 356 ms per loop
>>
>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>> 10 loops, best of 3: 62.2 ms per loop
>>
>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>> 10 loops, best of 3: 23.2 ms per loop
>>
>> Is this expected? I'm particularly confused at why scipy's lstsq
>> should be almost 40% slower than numpy's. (And shouldn't we have a
>> fast one-function way to solve least squares problems?)
>>
>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>
>> Is there any other version that's even faster or even more numerically stable?
>
> If you have very long x, then using normal equation is faster for univariate y.
>
> There are many different ways to calculate pinv
> https://github.com/scipy/scipy/pull/289
>
> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
> functions than scipy's lstsq
>
> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
> more accurate than numpy's, but maybe just different defaults.
> numpy's pinv was also faster than scipy's in the cases that I tried.
>
> There is only a single NIST case that can fail using the defaults with
> numpy pinv.
>
> (what about using qr or chol_solve ?)
>
> Lots of different ways to solve this and I never figured out a ranking
> across different cases, speed, precision, robustness to
> near-singularity.

Amazingly enough, in this case doing a full rank-revealing QR,
including calculating the rank via condition number of submatrices, is
tied for the fastest method:

In [66]: %timeit q, r, p = scipy.linalg.qr(x, mode="economic",
pivoting=True); [np.linalg.cond(r[:i, :i]) for i in xrange(1,
r.shape[0])]; np.linalg.solve(r[:, p], np.dot(q.T, y))
10 loops, best of 3: 22 ms per loop

Also tied for fastest with np.linalg.pinv (above) are the direct
method, cho_solve, and a non-rank-revealing QR with lower-triangular
backsolve:

In [70]: %timeit np.linalg.solve(np.dot(x.T, x), np.dot(x.T, y))
10 loops, best of 3: 21.4 ms per loop

In [71]: %timeit c = scipy.linalg.cho_factor(np.dot(x.T, x));
scipy.linalg.cho_solve(c, np.dot(x.T, y))
10 loops, best of 3: 21.2 ms per loop

In [72]: %timeit q, r = scipy.linalg.qr(x, mode="economic");
scipy.linalg.solve_triangular(r, np.dot(q.T, y))
10 loops, best of 3: 22.4 ms per loop

But AFAIK QR is the gold standard for precision on the kinds of linear
regression problems I care about (and definitely has better numerical
stability than the versions that involve dot(x.T, x)), and in my case
I actually want to detect and reject ill-conditioned problems rather
than impose some secondary disambiguating constraint, so... RRQR seems
to be the obvious choice.

Not sure if there's anything that could or should be done to make
these trade-offs more obvious to people less obsessive than me...


Numpy's lstsqr computes the sum of the squared residuals, which probablly adds significant time. It also applies Householder reflections to the rhs, which keeps the full dimensions of the rhs with the transformed residuals in the bottom, reduces the problem to a bidiagonal least squares problem, and solves that. Supposedly the bidiagonal method is sometimes superior to the truncated SVD (as used in pinv), but I'll bet it is more expensive. Keeping the full rhs dimensions will also be slower than a matrix multiplication that reduces the dimensions to match the solution dimensions, i.e, 780 preserved vs 2 reduced. Omitting the sum of the squared residuals does speed things up somewhat.

Current:

In [3]: %timeit np.linalg.lstsq(x, y)
1 loops, best of 3: 355 ms per loop

No sum of squared residuals:

In [3]: %timeit np.linalg.lstsq(x, y)
1 loops, best of 3: 268 ms per loop

Personally, I don't much care for the transformed residuals and always end up computing the residuals at the actual data points when I need them.

I think the moral of the story is know your problem and choose an appropriate method. For well conditioned problems where you aren't concerned about the transformed residuals, QR is probably the best. The np.linalg.lstsq method is probably a bit safer for naive least squares.

Chuck

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Least squares speed

Charles R Harris



On Fri, Oct 18, 2013 at 11:04 PM, Charles R Harris <[hidden email]> wrote:



On Wed, Oct 2, 2013 at 1:58 PM, Nathaniel Smith <[hidden email]> wrote:
On Tue, Oct 1, 2013 at 10:45 PM,  <[hidden email]> wrote:
> On Tue, Oct 1, 2013 at 5:24 PM, Nathaniel Smith <[hidden email]> wrote:
>> Here are several ways to solve the least squares problem XB = Y:
>>
>> scipy.linalg.lstsq(x, y)
>> np.linalg.lstsq(x, y)
>> np.dot(scipy.linalg.pinv(x), y)
>> np.dot(np.linalg.pinv(x), y)
>>
>> >From the documentation, I would expect these to be ordered by speed,
>> fastest up top and slowest at the bottom. It turns out they *are*
>> ordered by speed, except, on my system (linux x86-64, numpy 1.7.1,
>> scipy 0.12.0, ubuntu-installed atlas), it's exactly backwards, with
>> very substantial differences (more than a factor of 20!):
>>
>> # Typical smallish problem for me:
>> In [40]: x = np.random.randn(780, 2)
>>
>> In [41]: y = np.random.randn(780, 32 * 275)
>>
>> In [42]: %timeit scipy.linalg.lstsq(x, y)
>> 1 loops, best of 3: 494 ms per loop
>>
>> In [43]: %timeit np.linalg.lstsq(x, y)
>> 1 loops, best of 3: 356 ms per loop
>>
>> In [44]: %timeit np.dot(scipy.linalg.pinv(x), y)
>> 10 loops, best of 3: 62.2 ms per loop
>>
>> In [45]: %timeit np.dot(np.linalg.pinv(x), y)
>> 10 loops, best of 3: 23.2 ms per loop
>>
>> Is this expected? I'm particularly confused at why scipy's lstsq
>> should be almost 40% slower than numpy's. (And shouldn't we have a
>> fast one-function way to solve least squares problems?)
>>
>> Any reason not to use the last option? Is it as numerically stable as lstsq?
>>
>> Is there any other version that's even faster or even more numerically stable?
>
> If you have very long x, then using normal equation is faster for univariate y.
>
> There are many different ways to calculate pinv
> https://github.com/scipy/scipy/pull/289
>
> np.lstsq breaks on rank deficient x IIRC, uses different Lapack
> functions than scipy's lstsq
>
> In very badly scaled cases (worst NIST case), scipy's pinv was a bit
> more accurate than numpy's, but maybe just different defaults.
> numpy's pinv was also faster than scipy's in the cases that I tried.
>
> There is only a single NIST case that can fail using the defaults with
> numpy pinv.
>
> (what about using qr or chol_solve ?)
>
> Lots of different ways to solve this and I never figured out a ranking
> across different cases, speed, precision, robustness to
> near-singularity.

Amazingly enough, in this case doing a full rank-revealing QR,
including calculating the rank via condition number of submatrices, is
tied for the fastest method:

In [66]: %timeit q, r, p = scipy.linalg.qr(x, mode="economic",
pivoting=True); [np.linalg.cond(r[:i, :i]) for i in xrange(1,
r.shape[0])]; np.linalg.solve(r[:, p], np.dot(q.T, y))
10 loops, best of 3: 22 ms per loop

Also tied for fastest with np.linalg.pinv (above) are the direct
method, cho_solve, and a non-rank-revealing QR with lower-triangular
backsolve:

In [70]: %timeit np.linalg.solve(np.dot(x.T, x), np.dot(x.T, y))
10 loops, best of 3: 21.4 ms per loop

In [71]: %timeit c = scipy.linalg.cho_factor(np.dot(x.T, x));
scipy.linalg.cho_solve(c, np.dot(x.T, y))
10 loops, best of 3: 21.2 ms per loop

In [72]: %timeit q, r = scipy.linalg.qr(x, mode="economic");
scipy.linalg.solve_triangular(r, np.dot(q.T, y))
10 loops, best of 3: 22.4 ms per loop

But AFAIK QR is the gold standard for precision on the kinds of linear
regression problems I care about (and definitely has better numerical
stability than the versions that involve dot(x.T, x)), and in my case
I actually want to detect and reject ill-conditioned problems rather
than impose some secondary disambiguating constraint, so... RRQR seems
to be the obvious choice.

Not sure if there's anything that could or should be done to make
these trade-offs more obvious to people less obsessive than me...


Numpy's lstsqr computes the sum of the squared residuals, which probablly adds significant time. It also applies Householder reflections to the rhs, which keeps the full dimensions of the rhs with the transformed residuals in the bottom, reduces the problem to a bidiagonal least squares problem, and solves that. Supposedly the bidiagonal method is sometimes superior to the truncated SVD (as used in pinv), but I'll bet it is more expensive. Keeping the full rhs dimensions will also be slower than a matrix multiplication that reduces the dimensions to match the solution dimensions, i.e, 780 preserved vs 2 reduced. Omitting the sum of the squared residuals does speed things up somewhat.

Current:

In [3]: %timeit np.linalg.lstsq(x, y)
1 loops, best of 3: 355 ms per loop

No sum of squared residuals:

In [3]: %timeit np.linalg.lstsq(x, y)
1 loops, best of 3: 268 ms per loop

Personally, I don't much care for the transformed residuals and always end up computing the residuals at the actual data points when I need them.

I think the moral of the story is know your problem and choose an appropriate method. For well conditioned problems where you aren't concerned about the transformed residuals, QR is probably the best. The np.linalg.lstsq method is probably a bit safer for naive least squares.


And if you don't care about the residuals but want the bidiagonal method, qr is safe for dimension reduction.

In [2]: def mylstsq(x, y):
    q, r = np.linalg.qr(x, 'reduced')
    y = np.dot(q.T, y)
    return np.linalg.lstsq(r, y)
   ...:

In [3]: x = np.random.randn(780, 2)

In [4]: y = np.random.randn(780, 32 * 275)

In [5]: %timeit mylstsq(x, y)
100 loops, best of 3: 15.8 ms per loop

The 'reduced' keyword is new in 1.8.0

Chuck



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user