Serializing a directed, weighted graph

Question

I have a directed, weighted graph. Each node of the graph is represented as a 2-tuple, whose first element is the name of the node and whose second element is a tuple containing all vertices originating from this node, ordered by their weight. Basically the weight of each vertex is its index inside this tuple.

Exempli gratia:

a = ('A', () )

a is a node with name A from where no vertices originate.

b = ('B', () )
a = ('A', (b,) )

a is a node with name A with one vertex to the node named B, with a weight of 0.

b = ('B', () )
c = ('C', () )
a = ('A', (b, c) )

a is a node with name A with two vertices to the nodes named B and C, the first of weight 0 and the second of weight 1.

It is apparent that ('A', (b, c) ) is not equal to ('A', (c, b) ).

Now I need to serialize this graph according to these rules:

The first element of the result is the starting node.
Then follow all nodes directly accessible from the starting node in order of ascending weight. If a node is already in the result, do not append it again.
Now apply rules one and two recursively to all elements just added.

Basically, low-to-high (weight) first, depth second.

Here an example input and output:

f = ('F', () )
e = ('E', () )
d = ('D', (e,) )
c = ('C', (f, d, e) )
b = ('B', (d,) )
a = ('A', (b, c) )

results in:

['A', 'B', 'C', 'D', 'F', 'E']

Now my first naive approach was this:

def serialize (node):
    acc = []

    def serializeRec (node, level):
        tbd = [] #acc items to be deleted
        tbi = False #insertion index
        for idx, item in enumerate (acc):
            if item [1] > level and tbi == False:
                tbi = idx
            if item [0] == node [0]:
                if item [1] > level: tbd.append (item)
                else: break
        else:
            if tbi == False: acc.append ( (node [0], level) )
            else: acc.insert (tbi, (node [0], level) )
        for item in tbd:
            acc.remove (item)
        for vertex in node [1]:
            serializeRec (vertex, level + 1)

    serializeRec (node, 0)
    #remove levels
    return [node for node, level in acc]

Which apparently is quite a bad idea, because in each recursion I iterate over various lists. This is why I switched to dictionaries:

def serializeDict (node):
    levels = defaultdict (list) #nodes on each level
    nodes = {} #on which level is which node

    def serializeRec (node, level):
        try:
            curLevel = nodes [node [0] ]
            if curLevel > level:
                nodes [node [0] ] = level
                levels [curLevel].remove (node [0] )
                levels [level].append (node [0] )
        except:
            nodes [node [0] ] = level
            levels [level].append (node [0] )
        for vertex in node [1]:
            serializeRec (vertex, level + 1)

    serializeRec (node, 0)
    #flatten dict items
    return [node for level in (v for _, v in sorted (levels.items (), key = lambda x: x [0] ) ) for node in level]

Which runs a lot faster except for very small graphs.

My question is now:

How can I optimize this serialization with the goal of minimizing run-time?

Memory usage is of no concern (yeah, baby), KLOC is of no concern, only run-time counts. Everything can be changed, except the format of the input data. But if at the end it saves time, I will gladly reorganize this data inside the serialization function.

I thank you very much for reading this TL;DR wall of words.

Sample graph for fooling around:

z = ('Z', () ); y = ('Y', (z,) ); x = ('X', (z, y) ); w = ('W', (x, y, z) ); v = ('V', (w, x) ); u = ('U', (w, v) ); t = ('T', (u, w) ); s = ('S', (z, v, u) ); r = ('R', (t, u, z) ); q = ('Q', (r, z) ); p = ('P', (w, u) ); o = ('O', (v, r, q) ); n = ('N', (r, z) ); m = ('M', (t,) ); l = ('L', (r,) ); k = ('K', (x, v) ); j = ('J', (u,) ); i = ('I', (n, k) ); h = ('H', (k, x) ); g = ('G', (l,) ); f = ('F', (t, m) ); e = ('E', (u,) ); d = ('D', (t, e, v) ); c = ('C', (m,) ); b = ('B', (n,) ); a = ('A', (g, m, v) )

Just an idea. Try to do topological order first (which is O(V + E)) and then run non-recursive algorithm to serialize them — evhen14

Stephan Wenger Stephan Wenger · Accepted Answer · 2013-12-17T16:19:37

This works without recursion and uses a double ended queue for efficiency:

from collections import deque

def serialize_plain(n):
    name, children = n
    output = [name]

    candidates = deque(children)
    while candidates:
        cname, cchildren = candidates.popleft()
        if cname not in output:
            output.append(cname)
            candidates.extend(cchildren)

    return output

Depending on how big your graph is, it might make sense to keep a set of nodes you have already processed to avoid expensive list queries:

from collections import deque

def serialize_with_set(n):
    name, children = n
    output = [name]
    done = {name}

    candidates = deque(children)
    while candidates:
        cname, cchildren = candidates.popleft()
        if cname not in done:
            output.append(cname)
            done.add(cname)
            candidates.extend(cchildren)

    return output

Serializing a directed, weighted graph

2 Answers