[SciPy-User] question: OLS with sparse design and dense parameter

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[SciPy-User] question: OLS with sparse design and dense parameter

josef.pktd
(sunday night quest for another sparse recipe)

This is a sparse algorithm question:

scipy.sparse.linalg has a choice of linear system solvers, but I have not much idea about the relative merits. Is there a recommendation for the following case? 

suppose I want to do estimate a standard linear model by OLS
y = x * b,  find b = argmin(y - x*b)

x is sparse, with blocks, medium large (20000 by 2000) but if it works quite a bit larger, 
but we don't have a small number of dominant singular values, the solution of the linear system will be dense.

the basic parts of my minimum example

nobsi, n_groups, k_vars = 10, 2000, 3

#nobsi, n_groups, k_vars = 2, 3, 3


xi = np.arange(nobsi)[:,None]**np.arange(3) #vander

x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()

(balanced kron structure is just to make a quick example)


x_sp.shape

(20000, 6000)


first try


xpx = x_sp.T.dot(x_sp) # sparse

xpy = x_sp.T.dot(y) # dense


p_dense = np.linalg.solve(xpx.toarray(), xpy)

p_sp1 = sparse.linalg.spsolve(xpx, xpy)

p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]


timing: 15.6659998894 0.00500011444092 0.00600004196167


Is this too small to worry, and anything works, or is there a recommendation?

definitely don't use dense.


Thanks,


Josef


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: question: OLS with sparse design and dense parameter

Nathaniel Smith

I guess the classic solution for ols are sparse cholesky or sparse qr. scikits.sparse has a cholmod wrapper, and could be modified to have a suitesparseqr wrapper relatively easily.

On 28 Apr 2014 03:31, <[hidden email]> wrote:
(sunday night quest for another sparse recipe)

This is a sparse algorithm question:

scipy.sparse.linalg has a choice of linear system solvers, but I have not much idea about the relative merits. Is there a recommendation for the following case? 

suppose I want to do estimate a standard linear model by OLS
y = x * b,  find b = argmin(y - x*b)

x is sparse, with blocks, medium large (20000 by 2000) but if it works quite a bit larger, 
but we don't have a small number of dominant singular values, the solution of the linear system will be dense.

the basic parts of my minimum example

nobsi, n_groups, k_vars = 10, 2000, 3

#nobsi, n_groups, k_vars = 2, 3, 3


xi = np.arange(nobsi)[:,None]**np.arange(3) #vander

x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()

(balanced kron structure is just to make a quick example)


x_sp.shape

(20000, 6000)


first try


xpx = x_sp.T.dot(x_sp) # sparse

xpy = x_sp.T.dot(y) # dense


p_dense = np.linalg.solve(xpx.toarray(), xpy)

p_sp1 = sparse.linalg.spsolve(xpx, xpy)

p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]


timing: 15.6659998894 0.00500011444092 0.00600004196167


Is this too small to worry, and anything works, or is there a recommendation?

definitely don't use dense.


Thanks,


Josef


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: question: OLS with sparse design and dense parameter

Severin Holzer-Graf
I needed SuiteSparseQR for my work and added the backslash function of
it to scikits.sparse.
The modification is available here: https://github.com/duga3/scikits-sparse

2014-04-28 8:42 GMT+02:00 Nathaniel Smith <[hidden email]>:

> I guess the classic solution for ols are sparse cholesky or sparse qr.
> scikits.sparse has a cholmod wrapper, and could be modified to have a
> suitesparseqr wrapper relatively easily.
>
> On 28 Apr 2014 03:31, <[hidden email]> wrote:
>>
>> (sunday night quest for another sparse recipe)
>>
>> This is a sparse algorithm question:
>>
>> scipy.sparse.linalg has a choice of linear system solvers, but I have not
>> much idea about the relative merits. Is there a recommendation for the
>> following case?
>>
>> suppose I want to do estimate a standard linear model by OLS
>> y = x * b,  find b = argmin(y - x*b)
>>
>> x is sparse, with blocks, medium large (20000 by 2000) but if it works
>> quite a bit larger,
>> but we don't have a small number of dominant singular values, the solution
>> of the linear system will be dense.
>>
>> the basic parts of my minimum example
>>
>> nobsi, n_groups, k_vars = 10, 2000, 3
>>
>> #nobsi, n_groups, k_vars = 2, 3, 3
>>
>>
>> xi = np.arange(nobsi)[:,None]**np.arange(3) #vander
>>
>> x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()
>>
>> (balanced kron structure is just to make a quick example)
>>
>>
>> x_sp.shape
>>
>> (20000, 6000)
>>
>>
>> first try
>>
>>
>> xpx = x_sp.T.dot(x_sp) # sparse
>>
>> xpy = x_sp.T.dot(y) # dense
>>
>>
>> p_dense = np.linalg.solve(xpx.toarray(), xpy)
>>
>> p_sp1 = sparse.linalg.spsolve(xpx, xpy)
>>
>> p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]
>>
>>
>> timing: 15.6659998894 0.00500011444092 0.00600004196167
>>
>>
>> Is this too small to worry, and anything works, or is there a
>> recommendation?
>>
>> definitely don't use dense.
>>
>>
>> Thanks,
>>
>>
>> Josef
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: question: OLS with sparse design and dense parameter

josef.pktd
On Mon, Apr 28, 2014 at 2:45 AM, Severin Holzer-Graf
<[hidden email]> wrote:

> I needed SuiteSparseQR for my work and added the backslash function of
> it to scikits.sparse.
> The modification is available here: https://github.com/duga3/scikits-sparse
>
> 2014-04-28 8:42 GMT+02:00 Nathaniel Smith <[hidden email]>:
>> I guess the classic solution for ols are sparse cholesky or sparse qr.
>> scikits.sparse has a cholmod wrapper, and could be modified to have a
>> suitesparseqr wrapper relatively easily.
>>
>> On 28 Apr 2014 03:31, <[hidden email]> wrote:
>>>
>>> (sunday night quest for another sparse recipe)
>>>
>>> This is a sparse algorithm question:
>>>
>>> scipy.sparse.linalg has a choice of linear system solvers, but I have not
>>> much idea about the relative merits. Is there a recommendation for the
>>> following case?
>>>
>>> suppose I want to do estimate a standard linear model by OLS
>>> y = x * b,  find b = argmin(y - x*b)
>>>
>>> x is sparse, with blocks, medium large (20000 by 2000) but if it works
>>> quite a bit larger,
>>> but we don't have a small number of dominant singular values, the solution
>>> of the linear system will be dense.
>>>
>>> the basic parts of my minimum example
>>>
>>> nobsi, n_groups, k_vars = 10, 2000, 3
>>>
>>> #nobsi, n_groups, k_vars = 2, 3, 3
>>>
>>>
>>> xi = np.arange(nobsi)[:,None]**np.arange(3) #vander
>>>
>>> x_sp = sparse.kron(sparse.eye(n_groups), xi).tocsr()
>>>
>>> (balanced kron structure is just to make a quick example)
>>>
>>>
>>> x_sp.shape
>>>
>>> (20000, 6000)
>>>
>>>
>>> first try
>>>
>>>
>>> xpx = x_sp.T.dot(x_sp) # sparse
>>>
>>> xpy = x_sp.T.dot(y) # dense
>>>
>>>
>>> p_dense = np.linalg.solve(xpx.toarray(), xpy)
>>>
>>> p_sp1 = sparse.linalg.spsolve(xpx, xpy)
>>>
>>> p_sp2 = sparse.linalg.lsmr(x_sp, y)[0]
>>>
>>>
>>> timing: 15.6659998894 0.00500011444092 0.00600004196167
>>>
>>>
>>> Is this too small to worry, and anything works, or is there a
>>> recommendation?
>>>
>>> definitely don't use dense.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Josef

Thanks for the answers,

I would like to stick to scipy for now.
I don't know if sparse usage in statsmodels will get heavy enough to
require additional optional dependencies.
At least not any time soon.

scipy.sparse has several solvers for linear systems and one of them
should work well enough for my purposes (traditional
statistics/econometrics instead of "Big Data"), for now.

Josef


>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> [hidden email]
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user