5
votes

I have implemented a simple graph data structure in Python with the following structure below. The code is here just to clarify what the functions/variables mean, but they are pretty self-explanatory so you can skip reading it.

# Node data structure
class Node: 

    def __init__(self, label):        
        self.out_edges = []
        self.label = label
        self.is_goal = False


    def add_edge(self, node, weight = 0):          
        self.out_edges.append(Edge(node, weight))


# Edge data structure
class Edge:

    def __init__(self, node, weight = 0):          
        self.node = node
        self.weight = weight

    def to(self):                                  
        return self.node


# Graph data structure, utilises classes Node and Edge
class Graph:    

    def __init__(self):                             
        self.nodes = []

    # some other functions here populate the graph, and randomly select three goal nodes.

Now I am trying to implement a uniform-cost search (i.e. a BFS with a priority queue, guaranteeing a shortest path) which starts from a given node v, and returns a shortest path (in list form) to one of three goal node. By a goal node, I mean a node with the attribute is_goal set to true.

This is my implementation:

def ucs(G, v):
    visited = set()                  # set of visited nodes
    visited.add(v)                   # mark the starting vertex as visited
    q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples with cumulative cost
    q.put((0, v))                    # add the starting node, this has zero *cumulative* cost   
    goal_node = None                 # this will be set as the goal node if one is found
    parents = {v:None}               # this dictionary contains the parent of each node, necessary for path construction

    while not q.empty():             # while the queue is nonempty
        dequeued_item = q.get()        
        current_node = dequeued_item[1]             # get node at top of queue
        current_node_priority = dequeued_item[0]    # get the cumulative priority for later

        if current_node.is_goal:                    # if the current node is the goal
            path_to_goal = [current_node]           # the path to the goal ends with the current node (obviously)
            prev_node = current_node                # set the previous node to be the current node (this will changed with each iteration)

            while prev_node != v:                   # go back up the path using parents, and add to path
                parent = parents[prev_node]
                path_to_goal.append(parent)   
                prev_node = parent

            path_to_goal.reverse()                  # reverse the path
            return path_to_goal                     # return it

        else:
            for edge in current_node.out_edges:     # otherwise, for each adjacent node
                child = edge.to()                   # (avoid calling .to() in future)

                if child not in visited:            # if it is not visited
                    visited.add(child)              # mark it as visited
                    parents[child] = current_node   # set the current node as the parent of child
                    q.put((current_node_priority + edge.weight, child)) # and enqueue it with *cumulative* priority

Now, after lots of testing and comparing with other alogrithms, this implementation seemed to work pretty well - up until I tried it with this graph:

Graph one

For whatever reason, ucs(G,v) returned the path H -> I which costs 0.87, as opposed to the path H -> F -> I, costing 0.71 (this path was obtained by running a DFS). The following graph also gave an incorrect path:

Graph two

The algorithm gave G -> F instead of G -> E -> F, obtained again by the DFS. The only pattern I can observe among these rare cases is the fact that the chosen goal node always has a loop. I can't figure out what is going wrong though. Any tips will be much appreciated.

2
You consider a node "visited" before you actually visit it and before you can be sure you've found the cheapest path there. - user2357112 supports Monica
... to expand that: If there are two paths to a node, you only consider one of them, because you mark a node visited when you find the first path without checking if there isn't another (cheaper) path. This also clashes with "parent", where there's only ever one parent for each node, which is only fine, when it is the parent on the cheapest path - dhke
I see what you mean... but in the first example I gave, why did the algorithm choose the path H -> I, if it is meant to choose the cheapest path? Shouldn't the priority queue rank take care of that? How would I go about fixing the visited/parent thing? - Luke Collins
"why did the algorithm choose the path H -> I, if it is meant to choose the cheapest path" - because it's buggy in the ways that dhke and I just described. If things always did what they were meant to do instead of what you actually wrote, programming would be a lot easier. - user2357112 supports Monica
It's not related to your issues, but the classes from the queue module do a bunch of thread-synchronization stuff that you don't need. For a basic priority queue that's only being used in one thread, use heapq instead (it's what queue.PriorityQueue uses internally for its implementation). - Blckknght

2 Answers

2
votes

Usually for searches, I tend to keep the path to a node part of the queue. This is not really memory efficient, but cheaper to implement.

If you want the parent map, remember that it is only safe to update the parent map when the child is on top of the queue. Only then has the algorithm determined the shortest path to the current node.

def ucs(G, v):
    visited = set()                  # set of visited nodes
    q = queue.PriorityQueue()        # we store vertices in the (priority) queue as tuples 
                                     # (f, n, path), with
                                     # f: the cumulative cost,
                                     # n: the current node,
                                     # path: the path that led to the expansion of the current node
    q.put((0, v, [v]))               # add the starting node, this has zero *cumulative* cost 
                                     # and it's path contains only itself.

    while not q.empty():             # while the queue is nonempty
        f, current_node, path = q.get()
        visited.add(current_node)    # mark node visited on expansion,
                                     # only now we know we are on the cheapest path to
                                     # the current node.

        if current_node.is_goal:     # if the current node is a goal
            return path              # return its path
        else:
            for edge in in current_node.out_edges:
                child = edge.to()
                if child not in visited:
                    q.put((current_node_priority + edge.weight, child, path + [child]))

Note: I haven't really tested this, so feel free to comment, if it doesn't work right away.

0
votes

A simple check before expanding the node can save you duplicate visits.

while not q.empty():             # while the queue is nonempty
    f, current_node, path = q.get()
    if current_node not in visited: # check to avoid duplicate expansions
       visited.add(current_node)    # mark node visited on expansion,
                                    # only now we know we are on the cheapest path to
                                    # the current node.
       if current_node.is_goal:     # if the current node is a goal
          return path               # return its path
       ...