Hi all,
I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory.
For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product.
Any help, hints, or information would be greatly appreciated :) Cheers, Per _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
You can try spectral graph partioning, computing the Fiedler with arpack,
that only needs matrix-vector product: http://mathworld.wolfram.com/FiedlerVector.html http://en.wikipedia.org/wiki/Algebraic_connectivity HTH, Gael On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote: > Hi all, > I work with large sparse matrices which consists of several disconnected > components and I only need one of these. So far I have succesfully been > using scipy.sparse.csgraph.connected_components to dig out the component I > needed. This algorithm does however require the entire sparse matrix as > input, which sometimes is too large to fit in memory. > For this reason I wondered whether there existed a matrix-free version > of connected_components, or another algorithm achieving the same, where I > would only need to provide a function calculating the sparse matrix-vector > product. > Any help, hints, or information would be greatly appreciated :) > Cheers, > Per -- Gael Varoquaux Researcher, INRIA Parietal Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Thanks for the reply.
This approach looks like it could do the job, although it appaers to be significantly more computationally demanding than connected_components, due to the repeated need to find the Fiedler vector of very large systems. But I will give it a go! :)
Per
On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux <[hidden email]> wrote: You can try spectral graph partioning, computing the Fiedler with arpack, _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
I am not sure if this is what you are looking for, but an alternative
approach would be to transform the matrix into a graph, and then get the connected components using, for eample, networkx. You can find a complete description about the topic in: "Complex networks: Structure and dynamics" S. Boccaletti et al. (2006) Phys. Reports. Sorry that I cannot provide anything more concise. David. On Thu, Aug 2, 2012 at 8:07 AM, Per Nielsen <[hidden email]> wrote: > Thanks for the reply. > > This approach looks like it could do the job, although it appaers to be > significantly more computationally demanding than connected_components, due > to the repeated need to find the Fiedler vector of very large systems. But I > will give it a go! :) > > Per > > > On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux > <[hidden email]> wrote: >> >> You can try spectral graph partioning, computing the Fiedler with arpack, >> that only needs matrix-vector product: >> http://mathworld.wolfram.com/FiedlerVector.html >> http://en.wikipedia.org/wiki/Algebraic_connectivity >> >> HTH, >> >> Gael >> >> On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote: >> > Hi all, >> >> > I work with large sparse matrices which consists of several disconnected >> > components and I only need one of these. So far I have succesfully been >> > using scipy.sparse.csgraph.connected_components to dig out the component >> > I >> > needed. This algorithm does however require the entire sparse matrix as >> > input, which sometimes is too large to fit in memory. >> >> > For this reason I wondered whether there existed a matrix-free version >> > of connected_components, or another algorithm achieving the same, where >> > I >> > would only need to provide a function calculating the sparse >> > matrix-vector >> > product. >> >> > Any help, hints, or information would be greatly appreciated :) >> >> > Cheers, >> > Per >> >> >> -- >> Gael Varoquaux >> Researcher, INRIA Parietal >> Laboratoire de Neuro-Imagerie Assistee par Ordinateur >> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >> Phone: ++ 33-1-69-08-79-68 >> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux >> _______________________________________________ >> SciPy-User mailing list >> [hidden email] >> http://mail.scipy.org/mailman/listinfo/scipy-user > > > > _______________________________________________ > SciPy-User mailing list > [hidden email] > http://mail.scipy.org/mailman/listinfo/scipy-user > SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid.
Thank you for your reply and the reference. Per
On Thu, Aug 2, 2012 at 12:50 PM, Daπid <[hidden email]> wrote: I am not sure if this is what you are looking for, but an alternative _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Thu, Aug 02, 2012 at 02:15:07PM +0200, Per Nielsen wrote:
> I have looked into the functions provided by the networkX package, > however, as far as I have been able to understand both networkX and scipy > employ similar methods for finding the connected components. Thus I still > need the full sparse matrix (or graph which I think is similar in memory > usage) in memory, which I would like to avoid. But you can easily recode them to replace the indexing by a function call that would return the list of neighbors for a given node. That way you don't have to store the graph, you can generate it on the fly. G _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Per Nielsen
On Thu, Aug 2, 2012 at 6:15 AM, Per Nielsen <[hidden email]> wrote:
I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application. <snip> Chuck _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Gael Varoquaux
You are completely right and the networkX code seems simple enough so that I can actually manage it. Now I just need to optimize my graph/matrix function as it calculates the entire graph/matrix MUCH faster than going all the rows/columns individually.
Thanks. Per
On Thu, Aug 2, 2012 at 2:36 PM, Gael Varoquaux <[hidden email]> wrote:
_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
In reply to this post by Charles R Harris
I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested. Per _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
On Mon, Aug 6, 2012 at 3:33 AM, Per Nielsen <[hidden email]> wrote:
Well, it's in Cython ;) It is set up to use integer indices as node labels and all components are extracted as lists of integers. It's also pretty old and needs documentation, so I'll do that and clean it up a bit. I'll be on travel for the next few days, so unless matters are pressing, I won't get to it until the weekend. Chuck _______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Ah hehe. I have had a hard time optimizing my graph neighbor extraction code, so right now it is actually to slow to be of any use. I will try to find a different approach and I dont want you spending any of your time cleaning code, but thank you very much for the offer :)
Per
On Tue, Aug 7, 2012 at 3:41 AM, Charles R Harris <[hidden email]> wrote:
_______________________________________________ SciPy-User mailing list [hidden email] http://mail.scipy.org/mailman/listinfo/scipy-user |
Free forum by Nabble | Edit this page |