11AM May 30, 2009: How to effectively walk graphs and trees in Python.
10PM May 30, 2009: Edits from Reddit C/C.

Original How to effectively walk graphs and trees in Python. Edits from Reddit C/C. Additional example Editable
version 1 & 2 of 3

Recently Zed called out Python on some of its warts. His arguments are sound, The but the ensuing reddit thread has a few rather confused attempts where folk gripe about non-issues.


This is specifically about the claim that Python's 1000 deep recursion limit makes it impossible to walk unbalanced trees. Here I'll show a technique that I use all the time to walk trees. In fact, even if I were programming in Lisp, I'd still use a variant of this algorithm. Why? Because this algorithm will let you process a tree either breadth-first or depth-first. There are simple and pythonic ways to iterate over trees, and I will illustrate one. I greatly prefer this to the typical tree recursion process. It is actually similar to BFS in Lisp. With one slight modification you can process a tree either breadth-first or depth-first, You can switching dynamically switch between the two at run time based on the contents of a node. It can also be adapted to walk cyclic graphs.

telesphore4 (2009-05-30-22-37-29-161)

Python does some things well (& I actually like it) but tree walking is much clearer with recursion then without. No, recursion is not the solution to all of programming's ills but it is a powerful tool.



The secret is to not think of the tree (or graph) as a single global structure, but rather as a collection of nodes. Typically child links are treated as points where the code should recurse, and this recursion should happen in the language's stack. An alternative is to build your own stack and iterate over that. Each node contains references to child objects that need to be examined. Each node contains Think of it as a "todo list" of nodes to look at next. Instead of recursive recursing through the nodes, jot down the contents of the todo list (i.e., add the nodes you want to look at next (i.e., the child links) onto a master "master todo list list". Then iterate over the master list.

pseudosinusoid (2009-05-30-22-35-20-473)

This is just textbook BFS/DFS done imperatively.


Which is a better data structure for the master list, a stack or a FIFO? A stack would let you walk the tree depth first. Each child is pushed onto the stack. The top child is popped off, and its children are pushed on. A FIFO performs a breadth first search, as children are pushed to the start of the FIFO and older/higher nodes are popped from the end.

Why not have both? Use a deque. A deque provides the best of both. It is a nifty structure (based on a double linked list) that is both stack and FIFO. For each node of the tree, add the children to either the front (FIFO like) or the back (stack like) of the deque. Meanwhile iterate over the deque by popping off the back.

Let's have some code. This is lifted more or less directly from pacgraph. The map is stored as a dictionary of string tokens, instead of a tree of pointers references, but the technique works the same for both. There is room to debate that such a dictionary is easier for iterative languages, but that is a different post. The technique works the same for both. The 'master todo list' deque` is called to_crawl. This function starts at a specified node and returns all of it's children. It operates depth first.

from collections import deque

tree = {'a': ['b', 'c'],
        'b': ['d', 'e'],
        'c': [],
        'd': ['f'],
        'e': [],
        'f': []}

def children(token, tree):
    "returns a list of every child"
    child_list = []
    to_crawl = deque([token])
    while to_crawl:
        current = to_crawl.popleft()
        node_children = tree[current]
    return child_list

For an added bonus, keep By keeping track of visited nodes in a set to make it becomes a blazing fast cyclical graph walker:

def children(token, tree):
    "returns a list of every child"
    visited = set()
    to_crawl = deque([token])
    while to_crawl:
        current = to_crawl.popleft()
        if current in visited:
        node_children = set(tree[current])
        to_crawl.extend(node_children - visited)
    return list(visited)

If you want breadth first search, replace to_crawl.extend() with to_crawl.extendleft() To dynamically switch between BFS and DFS, add a test to decide either deque.extend() extend() or deque.extendleft() extendleft(). I use this to walk the tree of nonogram solutions, where promising solutions are scanned depth first and improbable solutions are eventually processed breadth first.