Original A high level language-agnostic puzzle solver. A high level language-agnostic puzzle solver. Editable
version 3 of 3

tl;dr sources at github


There are some puzzles that just beckon to be solved by computer. Stuff with lots of tedious logic, or six grids of the exact same puzzle. First you figure out a representation and type it all in. Then you try to write a constraint propagation solver, though tricky edge cases make you fall back onto brute force in places. Then you start profiling the solver so it actually finishes within an hour. Finally it finishes, only to claim that there are no solutions. You pull half your hair out trying to find the bug, and the other half trying to find someone who knows the language you work with and is willing to review your code. Twelve hours later you've got a solver that accounts for 95% of the puzzle but still needs a human to verify the solution because one of the rules was a bit hairy. During a Mystery Hunt, it was a good day if I could solve two whole puzzles without assistance.

The first way to make life easier: don't use imperative languages and instead use something declarative.

The main difference is that declarative languages don't require instructions. A pure declarative language cannot even be given sequential instructions. Instead they take goals and fill in the blanks to achieve the goal. To grossly simplify, solving a polynomial equation in an imperative language involves one line to describe the equation and then 10 lines to implement Newton's method. Whereas in a declarative language, it takes one line to describe the equation, and that is all. Run the 'program' and valid solutions are generated. No need to waste time writing and debugging a solver, finding solutions is baked in the way garbage collection is baked into high level languages.

The canonical declarative language is Prolog. It is probably worth learning, and it has been on my todo list for four years now. I've put it off because it has not been practical. If I can't even find anyone to review a simple and common language like Python, how am I going to find someone who has has even heard of a complicated niche language like Prolog? A practical domain-specific tool should be language agnostic, so that anyone who can code can work with it.

It would also be nice if the system was associative, commutative and distributive. (I'm stretching the usual definitions of the terms, forgive me.) Most languages are none of these. Change the order of a program and it breaks. Chop out half of the program and instead of giving you half an answer it refuses to run at all. Run a line twice by accident and it breaks. All of these are major impediments to having two people work on the same puzzle. As it is, the degree of complexity and frailty in a program requires that only a single person, trying to fit the entire mess into their head, can work on a programming puzzle at a time.

Boolean satisfaction solvers (SAT) ace every one of these objectives. They have made me a much more capable solver. When faced with a logic puzzle, it is not a question of if I can solve it but how long it will take. Based on my timings for a dozen puzzles or so, it has made me at least 60 times faster. Maybe more, because some of those were puzzles I had written off as unsolvable. Sadly I have not found a thorough introduction to boolean satisfaction. There are lots of research papers along the lines of "We solved the Rubik's Cube with only 100k clauses," but nothing about how they came up with the clauses. There is no analogous text along the lines of "Balancing your Checkbook and Other Everyday Applications of Arithmetic". I've been working on filling this void. You are reading it now, and are unlucky enough be one of the first. Expect rough edges and don't hesitate to ask for clarification.


  • BSP - boolean satisfaction problem
  • SAT - satisfiability
  • CNF - conjugative normal form
  • term - a single bit of state with a unique address
  • clause - a logical boolean statement of several terms
  • expression - a set of clauses
  • minisat - my favorite SAT solver, avaible at minisat.se

And a few I've made up for convenience

  • grid - the mapping of labels to a set of terms
  • physics - the clauses that define the rules of the puzzle
  • seed - the initial conditions of the puzzle

The meat and potatoes of the whole system is the CNF file. The CNF file represents an interlocking, looping, self-referential wiring of state bits (terms) and logic gates (clauses). CNF itself is extremely simple and extremely limited. First, you only get AND, OR, NOT to work with. Second, CNF is fairly strict with how the logic can be ordered. Every expression takes the form of

a OR b OR c == true
!d OR !c OR e == true

So that looks like the options are only OR, NOT. The AND is implicit because everything must be true. Each variable is called a term. Each line of ORs is called a clause. The entire set of clauses is an expression.

When it actually comes to expressing the CNF in a text file, the format is a little simpler. It is easy to burn through 26 variables, so they are given numbers instead of letters. The AND is implicit across newlines, a single space represents OR, a - represents NOT. (And for some reason every line ends in a zero as well.) So the above expression would look like

1 2 3 0
-4 -3 5 0

But these are about the least friendly variable names possible, so in practice we maintain a dictionary of mappings between the term number and something more descriptive. Of course you never write these by hand. Instead you use the language of your choice to generate the CNF file. While this sounds like we are right back to where we started (using fragile imperative languages), generating a CNF is simple, compact and forgiving.

"Nothing clears up a case so much as stating it to another person."

As long as everyone agrees on the same grid naming scheme, they can each generate chunks of the CNF individually in any language they want. Afterwards, all the sub-sections can be cat-ed together and the whole puzzle can be solved. If two people accidentally double up on a section, the solver eliminates the redundancy. If someone forgets to write a section, the solver will generate many solutions that meet the criteria that have been included.

Simple Example

Imagine a simpler version of the n-queens problem, called n-rooks. Of course n-rooks is trivial to solve (put them all on a diagonal), but it serves as a nice lead in to doing n-queens yourself. And we'll simplify it further by doing 4-rooks instead of generic n-rooks. This just to keep the pseudo-code samples short.

The most important step comes first. You need to decide on a naming scheme to describe all the state in the puzzle. In this case it is pretty simple:

(x, y)

The board contains sixteen bits of information, each with a unique cartesian coordinate. Now, there is a problem. The SAT solver requires this be mapped onto a one dimensional array. We used to have to do this mapping manually, and it was the most complicated and error prone step of solving. Now there is an automatic term generator that transparently allocated slots on the 1D array and caches the information for re-use. It is called auto_term. But that is a little long (and it'll be called in nearly every line of code) so it is common to use a single-letter function name for it.

f = auto_term

Now we can use f(x, y) where x and y are between 0 and 3 inclusive. 1-indexing is okay too, but you'll have to be consistent throughout the puzzle. True/false represents whether a rook is in the square. Here is a page just about auto_term if you'd like to read more.

Now you want to say "each row/col can only have a single rook in it". To start with, identify all the positions inside a single row:

define getrow(y) {
    for (x=0, x<4, x++) {
    return row }

and a similar getcol(x) with the x and y swapped. They just return a list of index positions.

Now comes creating the CNF. We are going to cheat here a bit, because this is the really really hard part. You can derive it on your own, but I've made a library for you. One of the most useful functions is called window(). It takes three arguments: a list of index positions, a minimum of positions to be true and a maximum of positions to be true. So by saying window(positions, 1, 1) you generate a dozen logical clauses which force exactly one of the four terms to be true. Another function is called write() and handles properly formatting everything.

for (y=0, y<4, y++)
    {write(window(get_row(y), 1, 1))}

for (x=0, x<4, x++)
    {write(window(get_col(x), 1, 1))}

Feed this file into the SAT solver and you'd get back an answer instantly. Now if we were actually doing n-queens, maybe you'd write the clauses for the rows and columns, while I'd write the clauses for the diagonals. (The code for the diagonals is very similar, just messier.) Finally a little bit of code to turn the solved CNF back into a picture of the puzzle.

Consider how you would write a solver for n-queens (not just 4-queens) and sudoku. Sudoku is almost a perfect fit for SAT, being little more than 81-rooks on a 3.09D 9x9x9 grid. A basic sudoku example is in the github repo.


Some of the limitations should be immediately obvious. For example, all variables are boolean and in most puzzles integers would be useful. But this is not too bad; it can be compensated for by using extra variable terms. A bigger problem is the number of clauses required. For example, sudoku needs 11600 clauses just to establish the physics of the grid. (While plugging in the initial board state only takes one clause per cell.) With some clever logic the required clauses can be cut down to 3 thousand, but there is no benefit to do so besides mathematical pedantry. In fact, highly minimal expressions can often take longer to solve because more backtracking is needed.

All the rules are generated by computer (sudoku has but 20 lines of code to make those 11K), so the high number does not really matter. Right? Well not always. In the code for window(), it produces roughly combination(length(positions), max+1) clauses. It is very easy to have a combinatorial explosion.

Imagine a logic puzzle, involving a dinner party. 26 people are seated around a table. 7 were served soup. How do you express this constraint in CNF?

My first attempt looked like this: Make 26 terms for whether or not each person had soup, call the set ate_soup. Constrain the terms with window(ate_soup, 7, 7). That means 26 C 8 clauses will be generated, or a whopping 1.5 million clauses for just that one little aspect of a puzzle.

Doing better requires thinking differently. Traditionally, solvers want to have as little state as possible. Less state means fewer errors and less backtracking, and everything runs faster. But not here. With SAT, it makes more sense to add in many extra variables. The goal is to express everything as sparse bit arrays, because sparse is an easy constraint. A better solution to the soup problem uses a 26x7 bit array and only needs 450 clauses to establish window([26 terms], 1, 1) across the seven soups and window([7 soups], 0, 1) across the 26 seats.

Designing a good representation of the puzzle is crucial. Note how you could either use 26 terms in the grid and 1.5 million clauses in the physics or 182 terms in the grid and 450 clauses in the physics.

Looking through the window()

This sort of mind bender is the only downside to working with SAT. Hopefully it will be an easier adjustment for you, since the worst is abstracted away by a library. Here though is a bit of the fundamentals.

Pretend for a moment that the window() function did not exist. For n-rooks, expressing "exactly 1 of 4 is true" is the core of the whole puzzle. So how is this said in CNF? Falling back to letters for readability:

a b c d
-a -b
-a -c
-a -d
-b -c
-b -d
-c -d

The first line states that at least one term must be true. The remaining 6 lines are the two-combinations of the four terms. Each combination forbids a pair of terms from being true simultaneously. Taken together the four terms are constrained such that one and only one must be true. Trying to think like this is absolutely brutal, and it is much easier to fall back on convenience functions operating over sparse arrays. (Designing these convenience functions from scratch is an immensely fun challenge, but thankfully only one person needs to be any good at it.)

The Solver

As an aside, what is going on with the solver? As far as I am concerned, SAT solvers are the fastest, most reliable piece of magical code I've ever used. Minisat is absolutely great for command line junkies and wrapping with your own libraries. I'd say it is fascinating to watch it work, but usually it is done in the blink of an eye. On my aging weak little laptop, Minisat can easily crunch through and solve 1 million clauses per second. Unless you are going out of your way to trick the solver, this rate seems to hold true up to a 100 million clauses. Maybe further, though at that point you are talking about CNF files bigger than a gigabyte and memory use starts hurting.

But every time I fire up Minisat it feels like I am sitting at the controls of a quantum computer. I've got all my qubits (terms) all wired together in some complicated expression and when power is applied every qubit will instantly collapse out of superposition and crystallize into a perfect answer. Of course SAT is actually an NP-complete problem and a real quantum computer could solve it in O(1) time.

'Brute Force' logic

"When you have excluded the impossible, whatever remains, however improbable, must be the truth."

My sat.py library has a lot of useful helper functions, but what do you do when you need something oddball and new that is not covered? There is a trivial way to translate anything into CNF that we'll go through now.

Imagine that you have two variables, A and B. You always want there to be an XOR relationship between them.

A  B  (possible)
0  0  impossible
0  1  valid
1  0  valid
1  1  impossible

There are four states and two of these are valid. To solve it by brute force exhaustion, forbid every impossible combination. Anything can be forbidden by simply taking its bit-wise negation.

illegal   invert
states    rule      CNF

A  B
0  0      A   B     1   2   0
1  1     !A  !B    -1  -2   0

And finally translate the A/B into proper CNF. The result seems very counterintuitive. At first glance it would appear to be contradictory and unworkable. At first glance it seems to suggest that both A and B need to be true and false at the same time. But it does work, and is actually saying "one or both must be true" and "one or both must be false," and equates to an XOR relationship. This is just one example of the little things you have to get used to when working with low-level SAT expressions.

Though this example is essential enough to already be in the sat.py library, and can be generated with window([A, B], 1, 1). Let's try something a little larger: express A xor B = C. Now there are three variables and 8 total states.

A  B  C
0  0  0
0  0  1
1  1  0
1  1  1

But we are only interested in the illegal states

illegal      invert
states       rule        CNF

A  B  C
0  0  1      A  B !C     1  2 -3  0
0  1  0      A !B  C     1 -2  3  0
1  0  0     !A  B  C    -1  2  3  0
1  1  1     !A !B !C    -1 -2 -3  0

Often at this stage you can look at the resulting CNF and notice a pattern. It should be pretty obvious what an N-wide XOR generator should do. Though often the pattern will be obscured by redundancies. More about removing redundancies on the SAT simplification page.

Extreme Brute Force

The explicit forbidding of the impossible works remarkably well. A substantial example is gol-braille.py. This uses SAT to simulate Conway's Game of Life, with the further criteria that the final sim step must produce a braille combination of the letter AGCT. The grid itself is a 3D stack, the Life board in the X-Y axis and time in the Z. A box of dead cells surrounds the grid to give it an edge, though it could be wired to wrap around instead.

Both Life and braille sections use brute force logic. Across the top of the grid, the board is divided up into 2x3 chunks. There are 64 possible 2x3 chunks, but we only want the four that correspond to AGCT. So for each chunk, 60 possibilities are forbidden.

The GoL simulator uses even more brute force. Each cell in the grid is part of a network of 10 cells: its current self, its future self and its current 8 neighbors. To generate the GoL physics, all 1024 possible combinations of these 10 terms are enumerated and the ones that are impossible within of the rules of Conway's Game of Life are forbidden. This generates hundreds of clauses for each location in the 3D grid, but only took minutes to write.

Multiple Solutions

A SAT solver will stop at the first solution. But it is easy to make it generate all the solutions. A solution takes the form of a list of true and false terms. To force the generation of a new solution, forbid the old one. Take the solution, expressed as a clause, negate it and append the new clause to the end of the CNF. Rerun the CNF and a new solution will appear. Repeat as necessary. This whole processes is automated in the sat.py library.

kk (2014-04-12-06-25-07-092)

specifically, sat.py's CNF.solutions() method.


"It is impossible as I state it, and therefore I must in some respect have stated it wrong."

Yes, there is even a debugger! It is meant to be used when the SAT solver informs you that there are no solutions to the problem. It will selectively block out chunks of the CNF file until a solution exists and then reports which section was responsible. This is done by providing reference points in the form of comments to delineate where sections begin and end. Typically there will be two sections that conflict with each other - one will be correct and the other will have a typo. It is also in the library, and is called debugger.py.

kk (2014-04-12-06-26-15-504)

Contrast this with Prolog's monsterous traces or terse 'false'.

General Solving Process

  1. Designs a grid nomenclature that will fit the puzzle.
  2. Each person tackles a different part of the grid physics using any language.
  3. Other people type up the initial conditions for the puzzle.
  4. A person codes up a display routine, converting the binary terms of the solution into a pretty picture.
  5. Everyone from steps 2 and 3 concatenates their CNFs together.
  6. Run the solver/debugger as appropriate.
  7. Render the resulting solutions.

Note that steps 2 through 4 can happen in parallel and don't even require active collaboration.

What about Prolog?

Prolog is an amazing language but has a few problems. One is lack of widespread adoption. You work best with tools you know well and switching to Prolog for all their day-to-day programming is just too much for most people. Vaguely knowing Prolog is even worse because Prolog is not a pure declarative language. The internet is littered with folks who have tried to write a Sudoku solver in Prolog and end up with quagmire built from the imperative features. These barely operate, producing a single solution after several hours. (Not linking examples to spare them embarrassment.) It is easy to write bad, slow code in any language, and Prolog is no exception. The simplest way out is to not use a Turing-complete language and instead use a language-agnostic declarative domain specific tool like SAT.

What about SAT+ ?

My only criticism about SAT is the lack of expressiveness. Instead there is simplicity from being limited to only bits, NOT and OR. It takes a certain amount of thought to translate a real problem into SAT. (And the bulk of these 3500 words has been about how to think in SAT.) There is not yet an expressive domain specific language to make SAT easier. (For example, regex is an expressive DSL that translates into a finite state machine. But you'd never want to write that FSM by hand.) SAT+ is a great effort to make SAT more expressive, but to me it feels like it misses the mark.

To make a very weak analogy, think about the relationship between C, C++ and Chicken Scheme. C is fast and simple(ish). C++ is almost as fast, moderately expressive and very complicated. Scheme is simple and expressive. But Chicken Scheme is not a slow interpreted language, instead it translates Scheme into C. By taking advantage of C, Chicken can be fast, expressive and simple.

By analogy, SAT is similar to C and Sat.py is similar to Chicken.

So where does C++ fit in? It doesn't. Chicken is better than C++ at high level stuff and C is better than C++ at low level stuff. So while I really like the work people are doing with SAT+, it still feels too young, complicated and academic to be of use now, given the tools we have at the present.


  • more annotated examples
  • complete API reference