Supplanting the CNF grid function.

Original Supplanting the CNF grid function. Editable
version 1 of 1



Summary

source: automerge.py

use: automerge.py output in1 in2 ...
assumes the existence of in1.cnf and in1.lut
generates output.cnf and output.lut
for combining expressions created by sat.CNF.auto_term()

Too Short; Didn't Understand

The most fundamental component of applying SAT is what I called the grid function. Underneath SAT, all data is represented as a (1-indexed) array of booleans. The grid function provided a convenient mapping to this array. For example, a grid function for sudoku might look like

lambda x,y,n: (x-1) + (y-1)*9 + (n-1)*81 + 1

where x/y/n are all between 1 and 9 inclusive. The brilliant aspect about the grid function is portability. Two people can work on different aspects of the same puzzle, each generating a CNF that describes a portion of the puzzle. Their work can be seamlessly combined into a valid solution by merely cat-ing the two CNFs together.

And as long as the grid function is properly designed, collaboration really is that simple. The trick is putting enough thought into the grid function.

  • If two different inputs collide and produce the same output number, it breaks.
  • If there is a gap in the output, the solver will product numerous spurious identical solutions.
  • If one person 1-indexes and another 0-indexes it will be horribly broken and impossible to tell why.
  • When a puzzle needs sub-tables or a table with ragged edges, the grid function becomes a convoluted monstrosity.

There is a alternative: memoization. Instead of carefully planning out every possible input to the grid, simply dole out new CNF addresses on a first-come, first-serve basis. Any whimsical labeling scheme may be used as long as every unique name is actually unique.

This was simple to code (see sat.CNF.auto_term()) but for more than a year I resisted using memoization. Sure, in practice memoization is so much easier. But it lacked elegance. Most importantly memoization was not portable. You could not concatenate two CNFs because the terms were dependent on the order they were created in. Two people working on the same puzzle are now speaking in wholly unique dialects instead of a common tongue. The convenience of memoization was not worth the crippling side effects.

But it turns out there are a lot of puzzles that are nearly impossible to write a grid function for. While solving, memoization lets you slip in any extra bits of state that were overlooked in the initial planning. A traditional grid function would need to be completely replaced. The cognitive load for creating grid functions is very high and seemed to be the most failure-prone stage of SAT solving.

Bringing collaboration back ended up being simpler than expected, only a few dozen lines of code. The memoized look-up is logged to a file. When it comes time to merge two several CNFs, the CNFs are cross referenced with the look-ups. A new CNF (and new look-up) is generated. This also has the benefit of logging every label used. A brief inspection of the look-up can reveal if someone used the wrong indexing by mistake.

Of course people still have to agree on the labeling system. In Python I can use any hashable data type for the grid labels. Nested tuples of complex numbers are legitimate, but probably a bad idea to force other people in other programming languages to use. For ease of collaboration, limiting yourself to tuples of ints and strings might be a good idea. Or to be really safe, only simple strings. To return to the sudoku example:

'%i, %i, %i'  % (x, y, n)

Multiple CNFs can be merged as long as the memoized look-ups all conform to the same pattern. (Hopefully people won't bikeshed too much on commas and spacing.) If someone breaks the pattern, it is fairly easy for another person to discover the error by looking at the look-up logs.

If someone needs a few extra terms for a tricky logical expression, they don't need to revise the grid function and force everyone to use the new grid. Instead they can prefix the new label with their name (to avoid namespace collisions) and plow onward.

That is the rational behind the auto_term() function. The traditional grid function in any CNF generator can be updated by replacing the grid function:

f = cnf.auto_term

Memoization is now a standard tool for anything that I work on and has completely replaced the error-prone grid functions for all but the simplest puzzles.