combine two sparse matrices

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

combine two sparse matrices

Robin-62
Hi,

I was wondering what the most (memory) efficient way of combining two
sparse matrices would be.

I am constructing a very large sparse matrix, but due to the temporary
memory required to calculate the entries I am doing it in blocks, with
the computation of each block done in a forked child process. This
returns a sparse matrix of the same dimensions as the full one, but
with a smaller number of entries. I would like to add the entries from
the block result to the 'master' copy. I can be sure that there will
be no overlap in the position of entries (ie no matrix position will
be in both sides).

What is the most memory efficient way of combining these? I noticed +=
isn't implemented, but it's not clear how that would work anyway. The
best I have done so far is adding two lil_matices (the block is
created as an lil-matrix for fancy indexing) A = A + Apartial, but as
the master copy grows this means I think that I will need double the
final memory requirement for A (to add the last block). Is there a
better way of doing this?

Also, what are the memory requirements for the conversions (.tocsc,
.tocsr etc.)? Will that mean I need double the memory anyway?

Thanks,

Robin
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: combine two sparse matrices

Robin-62
On Sat, May 3, 2008 at 12:07 PM, Robin <[hidden email]> wrote:

> Hi,
>
>  I was wondering what the most (memory) efficient way of combining two
>  sparse matrices would be.
>
>  I am constructing a very large sparse matrix, but due to the temporary
>  memory required to calculate the entries I am doing it in blocks, with
>  the computation of each block done in a forked child process. This
>  returns a sparse matrix of the same dimensions as the full one, but
>  with a smaller number of entries. I would like to add the entries from
>  the block result to the 'master' copy. I can be sure that there will
>  be no overlap in the position of entries (ie no matrix position will
>  be in both sides).
>
>  What is the most memory efficient way of combining these? I noticed +=
>  isn't implemented, but it's not clear how that would work anyway. The
>  best I have done so far is adding two lil_matices (the block is
>  created as an lil-matrix for fancy indexing) A = A + Apartial, but as
>  the master copy grows this means I think that I will need double the
>  final memory requirement for A (to add the last block). Is there a
>  better way of doing this?
>
>  Also, what are the memory requirements for the conversions (.tocsc,
>  .tocsr etc.)? Will that mean I need double the memory anyway?

After reading some more about the different sparse matrix types I am
now trying having both master and block matrices as dok, passing back
dict(Apartial) since cPickle can't pickle dok matrices, and then doing
A.update(Apartial).
This seems to be a bit slower (which is OK) but also the memory used
seems to be growing a little fast - it seems as though something is
not being released (I do del Apartial after the update but I have a
feeling it might be sticking around) and I'm worried the machine will
fall over again before it's finished.

I'd still appreciate some advice as to the best way to do this. I
thought of directly appending to the lists in the lil_matrix, but I
would then have to sort them again and I wasn't sure if the object
array could take resizing like this.

Robin
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: combine two sparse matrices

Nathan Bell-4
On Sat, May 3, 2008 at 10:16 AM, Robin <[hidden email]> wrote:
>  >
>  >  I was wondering what the most (memory) efficient way of combining two
>  >  sparse matrices would be.
>  >
>
>  I'd still appreciate some advice as to the best way to do this. I
>  thought of directly appending to the lists in the lil_matrix, but I
>  would then have to sort them again and I wasn't sure if the object
>  array could take resizing like this.

If you're worried about speed or memory consumption, then you should
avoid lil_matrix and dok_matrix.

The fastest and most memory efficient approach uses coo_matrix:

row = array([2,3,1,4])
col = array([4,2,3,1])
data = array([5,5,5,5])
A = coo_matrix( (data,(row,col)), shape=(5,5)).tocsr()

The conversion to CSR forces a copy, however CSR and COO are so much
more memory efficient than LIL/DOK that it shouldn't matter.

--
Nathan Bell [hidden email]
http://graphics.cs.uiuc.edu/~wnbell/
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: combine two sparse matrices

Robin-62
On Sat, May 3, 2008 at 4:24 PM, Nathan Bell <[hidden email]> wrote:

>
>  If you're worried about speed or memory consumption, then you should
>  avoid lil_matrix and dok_matrix.
>
>  The fastest and most memory efficient approach uses coo_matrix:
>
>  row = array([2,3,1,4])
>  col = array([4,2,3,1])
>  data = array([5,5,5,5])
>  A = coo_matrix( (data,(row,col)), shape=(5,5)).tocsr()
>
>  The conversion to CSR forces a copy, however CSR and COO are so much
>  more memory efficient than LIL/DOK that it shouldn't matter.

Thanks - I was just thinking about COO matrices. The problem is I have
to build it up in parts - I know lists are more efficient to extend
than arrays so I could store the data, row, col as lists and extend
with each chunk - then I think I can construct the COO matrix from the
list. I was worried that the lists would stay around though - even if
deleted so then I wouldn't have enough memory to do the COO-CSC
conversion.

Perhaps I could add another layer of indirection with another fork()
though to make sure the lists are cleaned up and just pass back the
COO matrix - hopefully the COO matrix will pickle OK.

Thanks,

Robin
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: combine two sparse matrices

David Warde-Farley-2
On 3-May-08, at 11:47 AM, Robin wrote:

> Perhaps I could add another layer of indirection with another fork()
> though to make sure the lists are cleaned up and just pass back the
> COO matrix - hopefully the COO matrix will pickle OK.

You probably don't want to use pickle, both save and load will be  
pretty slow.  I would call .tofile() on each of yourmatrix.row,  
yourmatrix.col and yourmatrix.data.

Cheers,
David
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: combine two sparse matrices

Anne Archibald
In reply to this post by Robin-62
2008/5/3 Robin <[hidden email]>:

> Hi,
>
>  I was wondering what the most (memory) efficient way of combining two
>  sparse matrices would be.
>
>  I am constructing a very large sparse matrix, but due to the temporary
>  memory required to calculate the entries I am doing it in blocks, with
>  the computation of each block done in a forked child process. This
>  returns a sparse matrix of the same dimensions as the full one, but
>  with a smaller number of entries. I would like to add the entries from
>  the block result to the 'master' copy. I can be sure that there will
>  be no overlap in the position of entries (ie no matrix position will
>  be in both sides).
>
>  What is the most memory efficient way of combining these? I noticed +=
>  isn't implemented, but it's not clear how that would work anyway. The
>  best I have done so far is adding two lil_matices (the block is
>  created as an lil-matrix for fancy indexing) A = A + Apartial, but as
>  the master copy grows this means I think that I will need double the
>  final memory requirement for A (to add the last block). Is there a
>  better way of doing this?
>
>  Also, what are the memory requirements for the conversions (.tocsc,
>  .tocsr etc.)? Will that mean I need double the memory anyway?

Sparse matrices, in any of the formats implemented in numpy/scipy, are
quite inefficient when dealing with subblocks. Every single entry in
the block must have at the least a four-byte index stored alongside
it. This need not necessarily be a problem, but it seems to be for
you. Perhaps you would be better off working with the blocks,
represented as dense arrays, directly? Or if the direct approach is
too cumbersome, you could write a quick(*) subclass that lets you
index the block as if it were part of a larger matrix.

(*) Actually I'm not sure just how quick it would be, but overriding
__getitem__ and __setitem__ plus the usual initialization stuff ought
to do the job.

Anne
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: combine two sparse matrices

Nathan Bell-4
On Sat, May 3, 2008 at 4:21 PM, Anne Archibald
<[hidden email]> wrote:
>
>  Sparse matrices, in any of the formats implemented in numpy/scipy, are
>  quite inefficient when dealing with subblocks. Every single entry in
>  the block must have at the least a four-byte index stored alongside
>  it. This need not necessarily be a problem, but it seems to be for
>  you.

FWIW there's now a BSR (block CSR) format in scipy.sparse for matrices
with dense sub-blocks.

--
Nathan Bell [hidden email]
http://graphics.cs.uiuc.edu/~wnbell/
_______________________________________________
SciPy-user mailing list
[hidden email]
http://projects.scipy.org/mailman/listinfo/scipy-user