May 2009: Edits from Reddit C/C.
September 2010: Additional example

Original How to effectively walk graphs and trees in Python. Edits from Reddit C/C. Additional example Editable
version 2 & 3 of 3

Recently Zed called out Python on some of its warts. His arguments are sound, but the ensuing reddit thread has a few rather confused attempts where folk gripe about non-issues.


This is specifically about the claim that Python's 1000 deep recursion limit makes it impossible to walk unbalanced trees. There are simple and pythonic ways to iterate over trees, and I will illustrate one. I greatly prefer this to the typical tree recursion process. It is actually similar to BFS in Lisp. With one slight modification you can process a tree either breadth-first or depth-first, switching dynamically between the two at run time based on the contents of a node. It can also be adapted to walk cyclic graphs.

Kiwi (2010-05-02-08-25-15-801)

An example of what they would look like if was done with recursion would help.


Typically child links are treated as points where the code should recurse, and this recursion should happen in the language's stack. An alternative is to build your own stack and iterate over that. Each node contains references to child objects that need to be examined. Think of it as a "todo list" of nodes to look at next. Instead of recursing through the nodes, add the nodes you want to look at next (i.e., the child links) onto a "master todo list". Then iterate over the list.


Which is a better data structure for the master list, a stack or a FIFO? A stack would let you walk the tree depth first. Each child is pushed onto the stack. The top child is popped off, and its children are pushed on. A FIFO performs a breadth first search, as children are pushed to the start of the FIFO and older/higher nodes are popped from the end.

Kyle (2009-05-30-22-55-01-280)

For all purposes, a Python list is a stack. There are a few convenience methods to iterate over the stack or pick slots by index. Manipulating the bottom element of a stack is O(n) while manipulating the top is O(1). Python's lists have the same time characteristics, though you append()/pop() instead of push/pop because the last element of the list is treated as the top of the stack.


A deque provides the best of both. It is a nifty structure (based on a double linked list) that is both stack and FIFO. For each node of the tree, add the children to either the front (FIFO like) or the back (stack like) of the deque. Meanwhile iterate over the deque by popping off the back.

Let's have some code. This is lifted more or less directly from pacgraph. The map is stored as a dictionary of string tokens, instead of a tree of references. The technique works the same for both. The 'master todo deque` is called to_crawl. This function starts at a specified node and returns all of it's children. It operates depth first.


from collections import deque

tree = {'a': ['b', 'c'],
        'b': ['d', 'e'],
        'c': [],
        'd': ['f'],
        'e': [],
        'f': []}

def children(token, tree):
    "returns a list of every child"
    child_list = []
    to_crawl = deque([token])
    while to_crawl:
        current = to_crawl.popleft()
        node_children = tree[current]
    return child_list

> children('b', tree)
... ['d', 'e', 'f']
rh0dium (2010-06-08-21-00-53-730)

How do you use this? What does token get set to??


By keeping track of visited nodes in a set it becomes a blazing fast cyclical graph walker:

def children(token, tree):
    "returns a list of every child"
    visited = set()
    to_crawl = deque([token])
    while to_crawl:
        current = to_crawl.popleft()
        if current in visited:
        node_children = set(tree[current])
        to_crawl.extend(node_children - visited)
    return list(visited)

If you want breadth first search, replace to_crawl.extend() with to_crawl.extendleft() To dynamically switch between BFS and DFS, add a test to decide either extend() or extendleft(). I use this to walk the tree of nonogram solutions, where promising solutions are scanned depth first and improbable solutions are eventually processed breadth first.