Matrix-free version of connected_components

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Matrix-free version of connected_components

Per Nielsen
Hi all,

I work with large sparse matrices which consists of several disconnected components and I only need one of these. So far I have succesfully been using scipy.sparse.csgraph.connected_components to dig out the component I needed. This algorithm does however require the entire sparse matrix as input, which sometimes is too large to fit in memory.

For this reason I wondered whether there existed a matrix-free version of connected_components, or another algorithm achieving the same, where I would only need to provide a function calculating the sparse matrix-vector product.

Any help, hints, or information would be greatly appreciated :)

Cheers,
Per





_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Gael Varoquaux
You can try spectral graph partioning, computing the Fiedler with arpack,
that only needs matrix-vector product:
http://mathworld.wolfram.com/FiedlerVector.html
http://en.wikipedia.org/wiki/Algebraic_connectivity

HTH,

Gael

On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
> Hi all,

> I work with large sparse matrices which consists of several disconnected
> components and I only need one of these. So far I have succesfully been
> using scipy.sparse.csgraph.connected_components to dig out the component I
> needed. This algorithm does however require the entire sparse matrix as
> input, which sometimes is too large to fit in memory.

> For this reason I wondered whether there existed a matrix-free version
> of connected_components, or another algorithm achieving the same, where I
> would only need to provide a function calculating the sparse matrix-vector
> product.

> Any help, hints, or information would be greatly appreciated :)

> Cheers,
> Per


--
    Gael Varoquaux
    Researcher, INRIA Parietal
    Laboratoire de Neuro-Imagerie Assistee par Ordinateur
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  ++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Per Nielsen
Thanks for the reply.

This approach looks like it could do the job, although it appaers to be significantly more computationally demanding than connected_components, due to the repeated need to find the Fiedler vector of very large systems. But I will give it a go! :)

Per

On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux <[hidden email]> wrote:
You can try spectral graph partioning, computing the Fiedler with arpack,
that only needs matrix-vector product:
http://mathworld.wolfram.com/FiedlerVector.html
http://en.wikipedia.org/wiki/Algebraic_connectivity

HTH,

Gael

On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
> Hi all,

> I work with large sparse matrices which consists of several disconnected
> components and I only need one of these. So far I have succesfully been
> using scipy.sparse.csgraph.connected_components to dig out the component I
> needed. This algorithm does however require the entire sparse matrix as
> input, which sometimes is too large to fit in memory.

> For this reason I wondered whether there existed a matrix-free version
> of connected_components, or another algorithm achieving the same, where I
> would only need to provide a function calculating the sparse matrix-vector
> product.

> Any help, hints, or information would be greatly appreciated :)

> Cheers,
> Per


--
    Gael Varoquaux
    Researcher, INRIA Parietal
    Laboratoire de Neuro-Imagerie Assistee par Ordinateur
    NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
    Phone:  <a href="tel:%2B%2B%2033-1-69-08-79-68" value="+33169087968">++ 33-1-69-08-79-68
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Daπid
I am not sure if this is what you are looking for, but an alternative
approach would be to transform the matrix into a graph, and then get
the connected components using, for eample, networkx. You can find a
complete description about the topic in: "Complex networks: Structure
and dynamics" S. Boccaletti et al. (2006) Phys. Reports. Sorry that I
cannot provide anything more concise.


David.

On Thu, Aug 2, 2012 at 8:07 AM, Per Nielsen <[hidden email]> wrote:

> Thanks for the reply.
>
> This approach looks like it could do the job, although it appaers to be
> significantly more computationally demanding than connected_components, due
> to the repeated need to find the Fiedler vector of very large systems. But I
> will give it a go! :)
>
> Per
>
>
> On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux
> <[hidden email]> wrote:
>>
>> You can try spectral graph partioning, computing the Fiedler with arpack,
>> that only needs matrix-vector product:
>> http://mathworld.wolfram.com/FiedlerVector.html
>> http://en.wikipedia.org/wiki/Algebraic_connectivity
>>
>> HTH,
>>
>> Gael
>>
>> On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
>> > Hi all,
>>
>> > I work with large sparse matrices which consists of several disconnected
>> > components and I only need one of these. So far I have succesfully been
>> > using scipy.sparse.csgraph.connected_components to dig out the component
>> > I
>> > needed. This algorithm does however require the entire sparse matrix as
>> > input, which sometimes is too large to fit in memory.
>>
>> > For this reason I wondered whether there existed a matrix-free version
>> > of connected_components, or another algorithm achieving the same, where
>> > I
>> > would only need to provide a function calculating the sparse
>> > matrix-vector
>> > product.
>>
>> > Any help, hints, or information would be greatly appreciated :)
>>
>> > Cheers,
>> > Per
>>
>>
>> --
>>     Gael Varoquaux
>>     Researcher, INRIA Parietal
>>     Laboratoire de Neuro-Imagerie Assistee par Ordinateur
>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>     Phone:  ++ 33-1-69-08-79-68
>>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Per Nielsen
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid.

Thank you for your reply and the reference.

Per

On Thu, Aug 2, 2012 at 12:50 PM, Daπid <[hidden email]> wrote:
I am not sure if this is what you are looking for, but an alternative
approach would be to transform the matrix into a graph, and then get
the connected components using, for eample, networkx. You can find a
complete description about the topic in: "Complex networks: Structure
and dynamics" S. Boccaletti et al. (2006) Phys. Reports. Sorry that I
cannot provide anything more concise.


David.

On Thu, Aug 2, 2012 at 8:07 AM, Per Nielsen <[hidden email]> wrote:
> Thanks for the reply.
>
> This approach looks like it could do the job, although it appaers to be
> significantly more computationally demanding than connected_components, due
> to the repeated need to find the Fiedler vector of very large systems. But I
> will give it a go! :)
>
> Per
>
>
> On Wed, Aug 1, 2012 at 3:18 PM, Gael Varoquaux
> <[hidden email]> wrote:
>>
>> You can try spectral graph partioning, computing the Fiedler with arpack,
>> that only needs matrix-vector product:
>> http://mathworld.wolfram.com/FiedlerVector.html
>> http://en.wikipedia.org/wiki/Algebraic_connectivity
>>
>> HTH,
>>
>> Gael
>>
>> On Wed, Aug 01, 2012 at 02:15:51PM +0200, Per Nielsen wrote:
>> > Hi all,
>>
>> > I work with large sparse matrices which consists of several disconnected
>> > components and I only need one of these. So far I have succesfully been
>> > using scipy.sparse.csgraph.connected_components to dig out the component
>> > I
>> > needed. This algorithm does however require the entire sparse matrix as
>> > input, which sometimes is too large to fit in memory.
>>
>> > For this reason I wondered whether there existed a matrix-free version
>> > of connected_components, or another algorithm achieving the same, where
>> > I
>> > would only need to provide a function calculating the sparse
>> > matrix-vector
>> > product.
>>
>> > Any help, hints, or information would be greatly appreciated :)
>>
>> > Cheers,
>> > Per
>>
>>
>> --
>>     Gael Varoquaux
>>     Researcher, INRIA Parietal
>>     Laboratoire de Neuro-Imagerie Assistee par Ordinateur
>>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>>     Phone:  <a href="tel:%2B%2B%2033-1-69-08-79-68" value="+33169087968">++ 33-1-69-08-79-68
>>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
>> _______________________________________________
>> SciPy-User mailing list
>> [hidden email]
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
> _______________________________________________
> SciPy-User mailing list
> [hidden email]
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Gael Varoquaux
On Thu, Aug 02, 2012 at 02:15:07PM +0200, Per Nielsen wrote:
>    I have looked into the functions provided by the networkX package,
>    however, as far as I have been able to understand both networkX and scipy
>    employ similar methods for finding the connected components. Thus I still
>    need the full sparse matrix (or graph which I think is similar in memory
>    usage) in memory, which I would like to avoid.

But you can easily recode them to replace the indexing by a function call
that would return the list of neighbors for a given node. That way you
don't have to store the graph, you can generate it on the fly.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Charles R Harris
In reply to this post by Per Nielsen


On Thu, Aug 2, 2012 at 6:15 AM, Per Nielsen <[hidden email]> wrote:
I have looked into the functions provided by the networkX package, however, as far as I have been able to understand both networkX and scipy employ similar methods for finding the connected components. Thus I still need the full sparse matrix (or graph which I think is similar in memory usage) in memory, which I would like to avoid.

Thank you for your reply and the reference.

I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.

<snip>

Chuck


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Per Nielsen
In reply to this post by Gael Varoquaux
You are completely right and the networkX code seems simple enough so that I can actually manage it. Now I just need to optimize my graph/matrix function as it calculates the entire graph/matrix MUCH faster than going all the rows/columns individually.

Thanks.
Per

On Thu, Aug 2, 2012 at 2:36 PM, Gael Varoquaux <[hidden email]> wrote:
On Thu, Aug 02, 2012 at 02:15:07PM +0200, Per Nielsen wrote:
>    I have looked into the functions provided by the networkX package,
>    however, as far as I have been able to understand both networkX and scipy
>    employ similar methods for finding the connected components. Thus I still
>    need the full sparse matrix (or graph which I think is similar in memory
>    usage) in memory, which I would like to avoid.

But you can easily recode them to replace the indexing by a function call
that would return the list of neighbors for a given node. That way you
don't have to store the graph, you can generate it on the fly.

G
_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user


_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Per Nielsen
In reply to this post by Charles R Harris

I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.


I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested.

Per

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Charles R Harris


On Mon, Aug 6, 2012 at 3:33 AM, Per Nielsen <[hidden email]> wrote:

I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.


I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested.

Well, it's in Cython ;) It is set up to use integer indices as node labels and all components are extracted as lists of integers. It's also pretty old and needs documentation, so I'll do that and clean it up a bit. I'll be on travel for the next few days, so unless matters are pressing, I won't get to it until the weekend.

Chuck

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user
Reply | Threaded
Open this post in threaded view
|

Re: Matrix-free version of connected_components

Per Nielsen
Ah hehe. I have had a hard time optimizing my graph neighbor extraction code, so right now it is actually to slow to be of any use. I will try to find a different approach and I dont want you spending any of your time cleaning code, but thank you very much for the offer :)

Per

On Tue, Aug 7, 2012 at 3:41 AM, Charles R Harris <[hidden email]> wrote:


On Mon, Aug 6, 2012 at 3:33 AM, Per Nielsen <[hidden email]> wrote:

I'm not sure what your application is, but if you just need connected components and have an easy way to find neighbors, then unionfind will partition the set for you. Although the common version doesn't make it easy to extract them, I have an implementation that keeps the connected nodes in a circular list for just that application.


I would very like to have copy of your algorithm, it might be easier to modify than the networkX code as Gael suggested.

Well, it's in Cython ;) It is set up to use integer indices as node labels and all components are extracted as lists of integers. It's also pretty old and needs documentation, so I'll do that and clean it up a bit. I'll be on travel for the next few days, so unless matters are pressing, I won't get to it until the weekend.

Chuck

_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user



_______________________________________________
SciPy-User mailing list
[hidden email]
http://mail.scipy.org/mailman/listinfo/scipy-user