# [SciPy-User] Conversion to graph adjacency list Classic List Threaded 8 messages Open this post in threaded view
|

## [SciPy-User] Conversion to graph adjacency list

 Hi, Scipy library provides C-implementation for some classical graph algorithms like Kruskal or connected component finding. Nevertheless, there is an efficiency problem: sparse graphs are usually given by list of edges or adjacency list (and not adjacency matrix). And in order to run the Scipy routines, we have to convert our graphs to the compressed SCR format (dense array is not well suited for graph with a lot of vertices, say more than 20,000). And sometimes, after execution we have to convert back. This causes the routine spending most of the execution time in conversion and retro-conversion tasks. So my question is: does Scipy provide a C-routine for converting and back converting to edge/adjacency list format? _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Conversion to graph adjacency list

 On Fri, Dec 7, 2018 at 10:51 PM <[hidden email]> wrote:>> Hi,>> Scipy library provides C-implementation for some classical graph algorithms like Kruskal or connected component finding. Nevertheless, there is an efficiency problem: sparse graphs are usually given by list of edges or adjacency list (and not adjacency matrix). And in order to run the Scipy routines, we have to convert our graphs to the compressed SCR format (dense array is not well suited for graph with a lot of vertices, say more than 20,000). And sometimes, after execution we have to convert back. This causes the routine spending most of the execution time in conversion and retro-conversion tasks.>> So my question is: does Scipy provide a C-routine for converting and back converting to edge/adjacency list format?How are you currently doing the conversion? The most efficient method is probably to convert from the CSR format to COO, which provides two parallel arrays giving the row and column indices of the edges. These can be simply zip()ed together to get a list of edge tuples.|30> import numpy as np|31> Nnodes = 20000|32> edges = []|33> for i in range(Nnodes):...>     js = np.random.permutation(Nnodes)[:int(np.sqrt(Nnodes))]...>     edges.extend([(i, j) for j in js])...>     # Construct a CSR matrix of the graph, reasonably efficiently.|35> from scipy import sparse|36> row_ind, col_ind = np.transpose(edges)|37> Gcsr = sparse.csr_matrix((np.ones_like(row_ind), (row_ind, col_ind)), shape=(Nnodes,...>  Nnodes))# Now convert back to a list of edge tuples.|39> Gcoo = Gcsr.tocoo()|40> coo_edges = zip(Gcoo.row, Gcoo.col)|41> set(edges) == set(coo_edges)True--Robert Kern _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Conversion to graph adjacency list

 Thanks Robert for your hepfull response, now I better understand how csr conversion works. I'm benchmarking some graph libraries. My edges-to-csr conversion was hand-written in Python, so very slow! Now, with your code, execution time is 7 times faster! Here is the code, perhaps we can optimize more: # ============================================= from time import clock from sys import stderr, stdin from scipy.sparse.csgraph import minimum_spanning_tree from scipy.sparse import csr_matrix import numpy as np # ------------ Scipy Code --------------------------- def wedges2adj(edges, n):         G=[[]for _ in range(n)]     for a, b, w in edges:         G[a].append((b,w))         G[b].append((a,w))     return G def wedges2scr(edges, n):     G=wedges2adj(edges, n)     indptr=     cnt=0           for line in G:         cnt+=len(line)         indptr.append(cnt)     data=[]     indices=[]     for i in range(n):         for (j,w) in G[i]:             data.append(w)             indices.append(j)                 return [data, indptr, indices] def csr2wedges(Mcsr, shape):     n, p=shape     k=0     edges=[]     data, cumul, cols  = Mcsr.data,Mcsr.indptr, Mcsr.indices       for i in range(n):         for j in range(cumul[i+1]-cumul[i]):             edges.append((i, cols[k], data[k]))             k+=1     return edges def kruskal_scipy(edges, n):     data, indptr, indices=wedges2scr(edges, n)     csr=csr_matrix((data, indices, indptr), shape=(n, n))         tree=minimum_spanning_tree(csr)     edges=csr2wedges(tree, (n,n))     return int(sum((round(w)) for (a,b,w) in edges)) # °°°°°°°°°°°°°°°°°°°°°°° def wedges2scr_FAST(wedges, n):     row_ind, col_ind, costs = np.transpose(wedges)     return csr_matrix((costs, (row_ind, col_ind)),     shape=(n,n)) def kruskal_scipy_FAST(wedges, n):         csr=wedges2scr_FAST(wedges, n)     tree=minimum_spanning_tree(csr).tocoo()     return int(round(sum(w for (i,j,w) in zip(tree.row, tree.col, tree.data)))) # ------------ Benchmark --------------------------- def go(solver, L):     global duration     N=len(L)     k=1     solution=[]     while k
Open this post in threaded view
|

## Re: Conversion to graph adjacency list

 Hmm. I'm not an expert here, but did you try using the "networkx" package?On Sat, Dec 8, 2018 at 7:36 AM <[hidden email]> wrote: Thanks Robert for your hepfull response, now I better understand how csr conversion works. I'm benchmarking some graph libraries. My edges-to-csr conversion was hand-written in Python, so very slow! Now, with your code, execution time is 7 times faster! Here is the code, perhaps we can optimize more: # ============================================= from time import clock from sys import stderr, stdin from scipy.sparse.csgraph import minimum_spanning_tree from scipy.sparse import csr_matrix import numpy as np # ------------ Scipy Code --------------------------- def wedges2adj(edges, n):     G=[[]for _ in range(n)]     for a, b, w in edges:         G[a].append((b,w))         G[b].append((a,w))     return G def wedges2scr(edges, n):     G=wedges2adj(edges, n)     indptr=     cnt=0     for line in G:         cnt+=len(line)         indptr.append(cnt)     data=[]     indices=[]     for i in range(n):         for (j,w) in G[i]:             data.append(w)             indices.append(j)     return [data, indptr, indices] def csr2wedges(Mcsr, shape):     n, p=shape     k=0     edges=[]     data, cumul, cols  = Mcsr.data,Mcsr.indptr, Mcsr.indices      for i in range(n):         for j in range(cumul[i+1]-cumul[i]):             edges.append((i, cols[k], data[k]))             k+=1     return edges def kruskal_scipy(edges, n):     data, indptr, indices=wedges2scr(edges, n)     csr=csr_matrix((data, indices, indptr), shape=(n, n))          tree=minimum_spanning_tree(csr)     edges=csr2wedges(tree, (n,n))     return int(sum((round(w)) for (a,b,w) in edges)) # °°°°°°°°°°°°°°°°°°°°°°° def wedges2scr_FAST(wedges, n):     row_ind, col_ind, costs = np.transpose(wedges)     return csr_matrix((costs, (row_ind, col_ind)),     shape=(n,n)) def kruskal_scipy_FAST(wedges, n):        csr=wedges2scr_FAST(wedges, n)     tree=minimum_spanning_tree(csr).tocoo()     return int(round(sum(w for (i,j,w) in zip(tree.row, tree.col, tree.data)))) # ------------ Benchmark --------------------------- def go(solver, L):     global duration     N=len(L)     k=1     solution=[]     while k
Open this post in threaded view
|

## Re: Conversion to graph adjacency list

 > Hmm. I'm not an expert here, but did you try using the "networkx" package? Of course I did. NetworkX has a very clean interface, clean code, is nicelly documented, well maintained, provides many features. But so slow :( Here are the results against the data file from my previous post: kruskal_networkX     : 29.411 kruskal_python       : 3.021 kruskal_gt           : 1.384 kruskal_networkit    : 2.501 kruskal_igraph       : 7.530 kruskal_scipy        : 10.744 kruskal_scipy_FAST   : 1.509 NO error detected [gt=Graph-tool] _______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
Open this post in threaded view
|

## Re: Conversion to graph adjacency list

 Have you looked at graph-tool, which is fast?Le 8 décembre 2018 17:49:14 GMT+01:00, [hidden email] a écrit : ` Hmm. I'm not an expert here, but did you try using the "networkx" package?Of course I did. NetworkX has a very clean interface, clean code, is nicelly documented, well maintained, provides many features. But so slow :(Here are the results against the data file from my previous post:kruskal_networkX : 29.411kruskal_python : 3.021kruskal_gt : 1.384kruskal_networkit : 2.501kruskal_igraph : 7.530kruskal_scipy : 10.744kruskal_scipy_FAST : 1.509NO error detected[gt=Graph-tool]SciPy-User mailing list[hidden email]https://mail.python.org/mailman/listinfo/scipy-user`-- Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté._______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user
 In reply to this post by elastica Sorry, I saw your benchmark too late!Le 8 décembre 2018 17:49:14 GMT+01:00, [hidden email] a écrit : ` Hmm. I'm not an expert here, but did you try using the "networkx" package?Of course I did. NetworkX has a very clean interface, clean code, is nicelly documented, well maintained, provides many features. But so slow :(Here are the results against the data file from my previous post:kruskal_networkX : 29.411kruskal_python : 3.021kruskal_gt : 1.384kruskal_networkit : 2.501kruskal_igraph : 7.530kruskal_scipy : 10.744kruskal_scipy_FAST : 1.509NO error detected[gt=Graph-tool]SciPy-User mailing list[hidden email]https://mail.python.org/mailman/listinfo/scipy-user`-- Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté._______________________________________________ SciPy-User mailing list [hidden email] https://mail.python.org/mailman/listinfo/scipy-user