2
votes

I am trying to implement Prim's algorithm for a graph consisting of cities as vertices, but I am stuck. Any advice would be greatly appreciated.

I am reading in the data from a txt file and trying to get output of the score (total distance) and a list of the edges in tuples. For example, by starting with Houston, the first edge would be ('HOUSTON', 'SAN ANTONIO').

I implemented the graph/tree using a dictionary with adjacent vertices and their distance, like so:

{'HOUSTON': [('LUBBOCK', '535'), ('MIDLAND/ODESSA', '494'), ('MISSION/MCALLEN/EDINBURG', '346'), ('SAN ANTONIO', '197')], 
'HARLINGEN/SAN BENITO': [('HOUSTON', '329')], 
'SAN ANTONIO': [], 
'WACO': [], 
'ABILENE': [('DALLAS/FORT WORTH', '181')], 
'LUBBOCK': [('MIDLAND/ODESSA', '137')], 
'COLLEGE STATION/BRYAN': [('DALLAS/FORT WORTH', '181'), ('HOUSTON', '97')], 
'MISSION/MCALLEN/EDINBURG': [], 
'AMARILLO': [('DALLAS/FORT WORTH', '361'), ('HOUSTON', '596')], 
'EL PASO': [('HOUSTON', '730'), ('SAN ANTONIO', '548')], 
'DALLAS/FORT WORTH': [('EL PASO', '617'), ('HOUSTON', '238'), ('KILLEEN', '154'), ('LAREDO', '429'), ('LONGVIEW', '128'), ('LUBBOCK', '322'), ('MIDLAND/ODESSA', '347'), ('MISSION/MCALLEN/EDINBURG', '506'), ('SAN ANGELO', '252'), ('SAN ANTONIO', '271'), ('WACO', '91'), ('WICHITA FALLS', '141')], 
'KILLEEN': [], 
'SAN ANGELO': [], 
'MIDLAND/ODESSA': [], 
'WICHITA FALLS': [], 
'CORPUS CHRISTI': [('DALLAS/FORT WORTH', '377'), ('HOUSTON', '207')], 
'AUSTIN': [('DALLAS/FORT WORTH', '192'), ('EL PASO', '573'), ('HOUSTON', '162')], 
'LONGVIEW': [], 
'BROWNSVILLE': [('DALLAS/FORT WORTH', '550'), ('HOUSTON', '355')], 
'LAREDO': [('SAN ANTONIO', '157')]} 

here is what i have so far:

import csv
import operator

def prim(file_path):
    with open(file_path) as csv_file:
        csv_reader = csv.reader(csv_file, delimiter = "\t")
        dict = {}
        for row in csv_reader:
            if row[0] == 'City':
                continue
            if row[0] in dict:
                dict[row[0]].append((row[1],row[3]))
                if row[1] not in dict:
                    dict[row[1]] = []
            else:
                dict[row[0]] = [(row[1], row[3])]

    V = dict.keys()
    A = ['HOUSTON']
    score = 0 # score result
    E = [] # tuple result

    while A != V:
        for x in A:
            dict[x].sort(key=lambda x: x[1])
            for y in dict[x]:
                if y[0] in V and y[0] not in A:
                    A.append(y[0])
                    E.append((x, y[0]))
                    score += int(y[1])
                    break
            break
        break

    print("Edges:")
    print(E)
    print("Score:")
    print(score)

prim("Texas.txt")

This gives the correct first edge because of that last break statement, but when I remove the break statement, it infinitely loops and I can't exactly figure out why or how to fix it. I realize I may be implementing this algorithm totally wrong and inefficiently, so I would really appreciate any tips or advice on where to go from here/what to do differently and why. Thank you in advance!!

1

1 Answers

0
votes

There are three main problems with your implementation:

  1. Your graph is stored as a directed graph. Prim's algorithm cannot be applied to directed graphs (reference). If you really want to process a directed graph, the comments in the linked question provide information for how to compute the MST-equivalent of a directed graph (reference).

    This is also related to why you algorithm gets stuck: It explores all vertices that are reachable via a directed edge from the starting vertex. There are more vertices in the graph, but they cannot be reached from the starting vertex when traversing the edges in the fixed direction. Therefore, the algorithm cannot grow the intermediate tree to include further vertices and gets stuck.

    However, I assume you actually want the graph to be undirected. In that case, you could for example compute the symmetric closure of your graph, by duplicating every edge in the reverse direction. Or alternatively, you could adapt your algorithm to check both outgoing and incoming edges.

  2. In Prim's algorithm, every iteration adds an edge to the intermediate tree. This is done by selecting the minimum-weight edge that connects the intermediate tree with a vertex from the remainder of the graph. However, in your code, you instead sort the outgoing edges of each vertex in the intermediate tree by their weight and add all edges pointing to a vertex that is not yet included in the intermediate tree. This will give you incorrect results. Consider the following example:

    enter image description here

    Assume we start at a. Your approach sorts the edges of a by their weight and adds them in that order. It therefore adds the edges a-b and a-c to the intermediate tree. All vertices of the graph are now contained in the intermediate tree, so the algorithm terminates. However, the tree b-a-c is not a minimum spanning tree.

    So instead, from a you should select the minimum edge, which is a-b. You should then search for the minimum-weight edge from any of the vertices in the intermediate tree. This would be the edge b-c. You would then add this edge to the intermediate tree, finally resulting in the MST a-b-c.

  3. The loop condition A != V will never be satisfied, even if A contains every vertex of the graph. This is because A is a list and V is the result of dict.keys(), which is not a list, but a view object. The view object is iterable, and keys will be given in the same order as they were inserted, but it will never evaluate as equal to a list, even if the list contains the same items.

    Even if you would turn V into a list (e.g. with V = list(dict.keys()), this would not be enough, since lists are only equal if they contain the same items in the same order. However, you have no guarantee that the algorithm will insert the vertices into A in the same order as the keys were inserted into the dictionary.

    Therefore, you should use some other approach for the loop condition. One possible solution would be to initialize V and A as sets, i.e. with V = set(dict.keys()) and A = set(['HOUSTON']). This would allow you to keep A != V as a loop condition. Or you could keep A a list, but only check if the size of A is equal to the size of V in each iteration. Since the algorithm only inserts into A if the vertex is not already contained in A, when len(A) == len(dict), it follows that A contains all vertices.

Here is an example for the fixed code. It first takes the symmetric closure of the input graph. As the loop condition, it checks if the size of A is unequal to the total number of vertices. In the loop, it computes the minimum-weight edge that connects a vertex in A with a vertex not in A:

    # Compute symmetric closure
    for v, edges in dict.items():
        for neighbor, weight in edges:
            edges_neighbor = dict[neighbor]
            if (v, weight) not in edges_neighbor:
              dict[neighbor].append((v, weight))

    A = ['HOUSTON']
    score = 0
    E = []

    while len(A) != len(dict):

        # Prepend vertex to each (target,weight) pair in dict[x]
        prepend_source = lambda x: map(lambda y: (x, *y), dict[x])
        edges = itertools.chain.from_iterable(map(prepend_source, A))

        # Keep only edges where the target is not in A
        edges_filtered = filter(lambda edge: edge[1] not in A, edges)

        # Select the minimum edge
        edge_min = min(edges_filtered, key=lambda x: int(x[2]))

        A.append(edge_min[1])
        E.append((edge_min[0], edge_min[1]))
        score += int(edge_min[2])

Note this implementation still assumes that the graph is connected. If you would want to handle disconnected graphs, you would have to apply this procedure for each connected component of the graph (resulting in a MST forest).