0
votes

I have a problem involving graph theory. To solve it, I would like to create a weighted graph using networkx. At the moment, I have a dictionnary where each key is a node, and each value is the associated weight (between 10 and 200 000 or so).

weights = {node: weight}

I believe I do not need to normalize the weights with networks. At the moment, I create a non-weighted graph by adding the edges:

def create_graph(data):
    edges = create_edges(data)

    # Create the graph
    G = nx.Graph()

    # Add edges
    G.add_edges_from(edges)

    return G

From what I read, I can add a weight to the edge. However, I would prefer the weight to be applied to a specific node instead of an edge. How can I do that?

Idea: I create the graph by adding the nodes weighted, and then I add the edges between the nodes.

def create_graph(data, weights):
    nodes = create_nodes(data)
    edges = create_edges(data) # list of tuples

    # Create the graph
    G = nx.Graph()

    # Add edges
    for node in nodes:
        G.add_node(node, weight=weights[node])

    # Add edges
    G.add_edges_from(edges)

    return G

Is this approach correct?

Next step is to find the path between 2 nodes with the smallest weight. I found this function: networkx.algorithms.shortest_paths.generic.shortest_path which I think is doing the right thing. However, it uses weights on the edge instead of weights on the nodes. Could someone explain me what this function does, what the difference between wieghts on the nodes and weights on the edges is for networkx, and how I could achieve what I am looking for? Thanks :)

1

1 Answers

2
votes

This generally looks right.

You might use bidirectional_dijkstra. It can be significantly faster if you know the source and target nodes of your path (see my comments at the bottom).

To handle the edge vs node weight issue, there are two options. First note that you are after the sum of the nodes along the path. If I give each edge a weight w(u,v) = w(u) + w(v) then the sum of weights along this is w(source) + w(target) + 2 sum(w(v)) where the nodes v are all nodes found along the way. Whatever has the minimum weight with these edge weights will have the minimum weight with the node weights.

So you could go and assign each edge the weight to be the sum of the two nodes.

for edge in G.edges():
    G.edges[edge]['weight'] = G.nodes[edge[0]]['weight'] + G.nodes[edge[1]]['weight']

But an alternative is to note that the weight input into bidirectional_dijkstra can be a function that takes the edge as input. Define your own function to give the sum of the two node weights:

def f(edge):
    u,v = edge
    return G.nodes[u]['weight'] + G.nodes[v]['weight']

and then in your call do bidirectional_dijkstra(G, source, target, weight=f)

So the choices I'm suggesting are to either assign each edge a weight equal to the sum of the node weights or define a function that will give those weights just for the edges the algorithm encounters. Efficiency-wise I expect it will take more time to figure out which is better than it takes to code either algorithm. The only performance issue is that assigning all the weights will use more memory. Assuming memory isn't an issue, use whichever one you think is easiest to implement and maintain.


Some comments on bidirectional dijkstra: Imagine you have two points in space a distance R apart and you want to find the shortest distance between them. The dijkstra algorithm (which is the default of shortest_path) will explore every point within distance D of the source point. Basically it's like expanding a balloon centered at the first point until it reaches the other. This has a volume (4/3) pi R^3. With bidirectional_dijkstra we inflate balloons centered at each until they touch. They will each have radius R/2. So the volume is (4/3)pi (R/2)^3 + (4/3) pi (R/2)^3, which is a quarter the volume of the original balloon, so the algorithm has explored a quarter of the space. Since networks can have very high effective dimension, the savings is often much bigger.