# combine two sparse matrices

7 messages
Open this post in threaded view
|

## combine two sparse matrices

 Hi, I was wondering what the most (memory) efficient way of combining two sparse matrices would be. I am constructing a very large sparse matrix, but due to the temporary memory required to calculate the entries I am doing it in blocks, with the computation of each block done in a forked child process. This returns a sparse matrix of the same dimensions as the full one, but with a smaller number of entries. I would like to add the entries from the block result to the 'master' copy. I can be sure that there will be no overlap in the position of entries (ie no matrix position will be in both sides). What is the most memory efficient way of combining these? I noticed += isn't implemented, but it's not clear how that would work anyway. The best I have done so far is adding two lil_matices (the block is created as an lil-matrix for fancy indexing) A = A + Apartial, but as the master copy grows this means I think that I will need double the final memory requirement for A (to add the last block). Is there a better way of doing this? Also, what are the memory requirements for the conversions (.tocsc, .tocsr etc.)? Will that mean I need double the memory anyway? Thanks, Robin _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: combine two sparse matrices

 On Sat, May 3, 2008 at 12:07 PM, Robin <[hidden email]> wrote: > Hi, > >  I was wondering what the most (memory) efficient way of combining two >  sparse matrices would be. > >  I am constructing a very large sparse matrix, but due to the temporary >  memory required to calculate the entries I am doing it in blocks, with >  the computation of each block done in a forked child process. This >  returns a sparse matrix of the same dimensions as the full one, but >  with a smaller number of entries. I would like to add the entries from >  the block result to the 'master' copy. I can be sure that there will >  be no overlap in the position of entries (ie no matrix position will >  be in both sides). > >  What is the most memory efficient way of combining these? I noticed += >  isn't implemented, but it's not clear how that would work anyway. The >  best I have done so far is adding two lil_matices (the block is >  created as an lil-matrix for fancy indexing) A = A + Apartial, but as >  the master copy grows this means I think that I will need double the >  final memory requirement for A (to add the last block). Is there a >  better way of doing this? > >  Also, what are the memory requirements for the conversions (.tocsc, >  .tocsr etc.)? Will that mean I need double the memory anyway? After reading some more about the different sparse matrix types I am now trying having both master and block matrices as dok, passing back dict(Apartial) since cPickle can't pickle dok matrices, and then doing A.update(Apartial). This seems to be a bit slower (which is OK) but also the memory used seems to be growing a little fast - it seems as though something is not being released (I do del Apartial after the update but I have a feeling it might be sticking around) and I'm worried the machine will fall over again before it's finished. I'd still appreciate some advice as to the best way to do this. I thought of directly appending to the lists in the lil_matrix, but I would then have to sort them again and I wasn't sure if the object array could take resizing like this. Robin _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: combine two sparse matrices

 On Sat, May 3, 2008 at 10:16 AM, Robin <[hidden email]> wrote: >  > >  >  I was wondering what the most (memory) efficient way of combining two >  >  sparse matrices would be. >  > > >  I'd still appreciate some advice as to the best way to do this. I >  thought of directly appending to the lists in the lil_matrix, but I >  would then have to sort them again and I wasn't sure if the object >  array could take resizing like this. If you're worried about speed or memory consumption, then you should avoid lil_matrix and dok_matrix. The fastest and most memory efficient approach uses coo_matrix: row = array([2,3,1,4]) col = array([4,2,3,1]) data = array([5,5,5,5]) A = coo_matrix( (data,(row,col)), shape=(5,5)).tocsr() The conversion to CSR forces a copy, however CSR and COO are so much more memory efficient than LIL/DOK that it shouldn't matter. -- Nathan Bell [hidden email] http://graphics.cs.uiuc.edu/~wnbell/_______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: combine two sparse matrices

 On Sat, May 3, 2008 at 4:24 PM, Nathan Bell <[hidden email]> wrote: > >  If you're worried about speed or memory consumption, then you should >  avoid lil_matrix and dok_matrix. > >  The fastest and most memory efficient approach uses coo_matrix: > >  row = array([2,3,1,4]) >  col = array([4,2,3,1]) >  data = array([5,5,5,5]) >  A = coo_matrix( (data,(row,col)), shape=(5,5)).tocsr() > >  The conversion to CSR forces a copy, however CSR and COO are so much >  more memory efficient than LIL/DOK that it shouldn't matter. Thanks - I was just thinking about COO matrices. The problem is I have to build it up in parts - I know lists are more efficient to extend than arrays so I could store the data, row, col as lists and extend with each chunk - then I think I can construct the COO matrix from the list. I was worried that the lists would stay around though - even if deleted so then I wouldn't have enough memory to do the COO-CSC conversion. Perhaps I could add another layer of indirection with another fork() though to make sure the lists are cleaned up and just pass back the COO matrix - hopefully the COO matrix will pickle OK. Thanks, Robin _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: combine two sparse matrices

 On 3-May-08, at 11:47 AM, Robin wrote: > Perhaps I could add another layer of indirection with another fork() > though to make sure the lists are cleaned up and just pass back the > COO matrix - hopefully the COO matrix will pickle OK. You probably don't want to use pickle, both save and load will be   pretty slow.  I would call .tofile() on each of yourmatrix.row,   yourmatrix.col and yourmatrix.data. Cheers, David _______________________________________________ SciPy-user mailing list [hidden email] http://projects.scipy.org/mailman/listinfo/scipy-user