4
votes

I have a directed, weighted graph. Each node of the graph is represented as a 2-tuple, whose first element is the name of the node and whose second element is a tuple containing all vertices originating from this node, ordered by their weight. Basically the weight of each vertex is its index inside this tuple.

Exempli gratia:

a = ('A', () )

a is a node with name A from where no vertices originate.

b = ('B', () )
a = ('A', (b,) )

a is a node with name A with one vertex to the node named B, with a weight of 0.

b = ('B', () )
c = ('C', () )
a = ('A', (b, c) )

a is a node with name A with two vertices to the nodes named B and C, the first of weight 0 and the second of weight 1.

It is apparent that ('A', (b, c) ) is not equal to ('A', (c, b) ).

Now I need to serialize this graph according to these rules:

  1. The first element of the result is the starting node.
  2. Then follow all nodes directly accessible from the starting node in order of ascending weight. If a node is already in the result, do not append it again.
  3. Now apply rules one and two recursively to all elements just added.

Basically, low-to-high (weight) first, depth second.

Here an example input and output:

f = ('F', () )
e = ('E', () )
d = ('D', (e,) )
c = ('C', (f, d, e) )
b = ('B', (d,) )
a = ('A', (b, c) )

results in:

['A', 'B', 'C', 'D', 'F', 'E']

Now my first naive approach was this:

def serialize (node):
    acc = []

    def serializeRec (node, level):
        tbd = [] #acc items to be deleted
        tbi = False #insertion index
        for idx, item in enumerate (acc):
            if item [1] > level and tbi == False:
                tbi = idx
            if item [0] == node [0]:
                if item [1] > level: tbd.append (item)
                else: break
        else:
            if tbi == False: acc.append ( (node [0], level) )
            else: acc.insert (tbi, (node [0], level) )
        for item in tbd:
            acc.remove (item)
        for vertex in node [1]:
            serializeRec (vertex, level + 1)

    serializeRec (node, 0)
    #remove levels
    return [node for node, level in acc]

Which apparently is quite a bad idea, because in each recursion I iterate over various lists. This is why I switched to dictionaries:

def serializeDict (node):
    levels = defaultdict (list) #nodes on each level
    nodes = {} #on which level is which node

    def serializeRec (node, level):
        try:
            curLevel = nodes [node [0] ]
            if curLevel > level:
                nodes [node [0] ] = level
                levels [curLevel].remove (node [0] )
                levels [level].append (node [0] )
        except:
            nodes [node [0] ] = level
            levels [level].append (node [0] )
        for vertex in node [1]:
            serializeRec (vertex, level + 1)

    serializeRec (node, 0)
    #flatten dict items
    return [node for level in (v for _, v in sorted (levels.items (), key = lambda x: x [0] ) ) for node in level]

Which runs a lot faster except for very small graphs.

My question is now:

How can I optimize this serialization with the goal of minimizing run-time?

Memory usage is of no concern (yeah, baby), KLOC is of no concern, only run-time counts. Everything can be changed, except the format of the input data. But if at the end it saves time, I will gladly reorganize this data inside the serialization function.

I thank you very much for reading this TL;DR wall of words.


Sample graph for fooling around:

z = ('Z', () ); y = ('Y', (z,) ); x = ('X', (z, y) ); w = ('W', (x, y, z) ); v = ('V', (w, x) ); u = ('U', (w, v) ); t = ('T', (u, w) ); s = ('S', (z, v, u) ); r = ('R', (t, u, z) ); q = ('Q', (r, z) ); p = ('P', (w, u) ); o = ('O', (v, r, q) ); n = ('N', (r, z) ); m = ('M', (t,) ); l = ('L', (r,) ); k = ('K', (x, v) ); j = ('J', (u,) ); i = ('I', (n, k) ); h = ('H', (k, x) ); g = ('G', (l,) ); f = ('F', (t, m) ); e = ('E', (u,) ); d = ('D', (t, e, v) ); c = ('C', (m,) ); b = ('B', (n,) ); a = ('A', (g, m, v) )
2
Just an idea. Try to do topological order first (which is O(V + E)) and then run non-recursive algorithm to serialize themevhen14
@evhen14 Thank you, I will look into it.Hyperboreus

2 Answers

1
votes

This works without recursion and uses a double ended queue for efficiency:

from collections import deque

def serialize_plain(n):
    name, children = n
    output = [name]

    candidates = deque(children)
    while candidates:
        cname, cchildren = candidates.popleft()
        if cname not in output:
            output.append(cname)
            candidates.extend(cchildren)

    return output

Depending on how big your graph is, it might make sense to keep a set of nodes you have already processed to avoid expensive list queries:

from collections import deque

def serialize_with_set(n):
    name, children = n
    output = [name]
    done = {name}

    candidates = deque(children)
    while candidates:
        cname, cchildren = candidates.popleft()
        if cname not in done:
            output.append(cname)
            done.add(cname)
            candidates.extend(cchildren)

    return output
0
votes

Now I need to serialize this graph according to these rules:

The first element of the result is the starting node. Then follow all nodes directly accessible from the starting node in order of ascending weight. If a node is already in the result, do not append it again. Now apply rules one and two recursively to all elements just added.

I'd like to add that from a theory point of view, this is a very common way of traversing a graph, known as Breadth First Traversal, with the added requirement of sorting the neighbour lists. As mentioned in the first answer, it is typically approached with queues to avoid recursion.

You should find Breadth First Traversal in any self-respecting graph library, if your project allows to use one. This one, for example, should be fast, as it is based on the excellent C++ boost::graph, which is the de-facto standard graph library in the C++ world.