15
votes

I am new using Python for working with graphs: NetworkX. Until now I have used Gephi. There the standard steps (but not the only possible) are:

  1. Load the nodes informations from a table/spreadsheet; one of the columns should be ID and the rest are metadata about the nodes (nodes are people, so gender, groups... normally to be used for coloring). Like:

    id;NormalizedName;Gender
    per1;Jesús;male
    per2;Abraham;male
    per3;Isaac;male
    per4;Jacob;male
    per5;Judá;male
    per6;Tamar;female
    ...
    
  2. Then load the edges also from a table/spreadsheet, using the same names for the nodes as it was in the column ID of the nodes spreadsheet with normally four columns (Target, Source, Weight and Type):

    Target;Source;Weight;Type
    per1;per2;3;Undirected
    per3;per4;2;Undirected
    ...
    

This are the two dataframes that I have and that I want to load in Python. Reading about NetworkX, it seems that it's not quite possible to load two tables (one for nodes, one for edges) into the same graph and I am not sure what would be the best way:

  1. Should I create a graph only with the nodes informations from the DataFrame, and then add (append) the edges from the other DataFrame? If so and since nx.from_pandas_dataframe() expects information about the edges, I guess I shouldn't use it to create the nodes... Should I just pass the information as lists?

  2. Should I create a graph only with the edges information from the DataFrame and then add to each node the information from the other DataFrame as attributes? Is there a better way for doing that than iterating over the DataFrame and the nodes?

3

3 Answers

31
votes

Create the weighted graph from the edge table using nx.from_pandas_dataframe:

import networkx as nx
import pandas as pd

edges = pd.DataFrame({'source' : [0, 1],
                      'target' : [1, 2],
                      'weight' : [100, 50]})

nodes = pd.DataFrame({'node' : [0, 1, 2],
                      'name' : ['Foo', 'Bar', 'Baz'],
                      'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

Then add the node attributes from dictionaries using set_node_attributes:

nx.set_node_attributes(G, 'name', pd.Series(nodes.name, index=nodes.node).to_dict())
nx.set_node_attributes(G, 'gender', pd.Series(nodes.gender, index=nodes.node).to_dict())

Or iterate over the graph to add the node attributes:

for i in sorted(G.nodes()):
    G.node[i]['name'] = nodes.name[i]
    G.node[i]['gender'] = nodes.gender[i]

Update:

As of nx 2.0 the argument order of nx.set_node_attributes has changed: (G, values, name=None)

Using the example from above:

nx.set_node_attributes(G, pd.Series(nodes.gender, index=nodes.node).to_dict(), 'gender')

And as of nx 2.4, G.node[] is replaced by G.nodes[].

7
votes

Here's basically the same answer, but updated with some details filled in. We'll start with basically the same setup, but here there won't be indices for the nodes, just names to address @LancelotHolmes comment and make it more general:

import networkx as nx
import pandas as pd

linkData = pd.DataFrame({'source' : ['Amy', 'Bob'],
                  'target' : ['Bob', 'Cindy'],
                  'weight' : [100, 50]})

nodeData = pd.DataFrame({'name' : ['Amy', 'Bob', 'Cindy'],
                  'type' : ['Foo', 'Bar', 'Baz'],
                  'gender' : ['M', 'F', 'M']})

G = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())

Here the True parameter tells NetworkX to keep all the properties in the linkData as link properties. In this case I've made it a DiGraph type, but if you don't need that, then you can make it another type in the obvious way.

Now, since you need to match the nodeData by the name of the nodes generated from the linkData, you need to set the index of the nodeData dataframe to be the name property, before making it a dictionary so that NetworkX 2.x can load it as the node attributes.

nx.set_node_attributes(G, nodeData.set_index('name').to_dict('index'))

This loads the whole nodeData dataframe into a dictionary in which the key is the name, and the other properties are key:value pairs within that key (i.e., normal node properties where the node index is its name).

6
votes

A small remark:

from_pandas_dataframe doesn't work in nx 2, referring to this one

G = nx.from_pandas_dataframe(edges, 'source', 'target', 'weight')

I think that in nx 2.0 it goes like that:

G = nx.from_pandas_edgelist(edges, source = "Source", target = "Target")