SAT debugging and typo detection
summary: Run the command without arguments or options to get help.
Pure declarative programs are generally very weird to debug. They don't have instructions, so there is nothing to step through. They don't have state, so there is no memory to examine. The solving engine for a declarative program (in our case, Minisat) will have instructions and state, but a highly optimized backend works with an intermediate form that is many steps removed from the input program.
The most frustrating case to debug occurs when there are no solutions. As a negative can't be proven, neither can a program that produces no output be debugged. The opposite case (too many solutions) is simple - compare any output to the rules at hand, find the rule it violates and tighten up the code for that section.
The case of no solution is usually caused by one overly restrictive rule. A straightforward way of finding which rule is blocking the solution would be to disable each rule one at a time. Disabling some rules will have no effect, and these can be eliminated from the search. Disabling the buggy rule will of course produce solutions. But it takes two to have a disagreement - the buggy rule will usually disagree with one (or more) other rules, and disabling these good rules will also produce solutions.
This is a lot of work all around. Disabling sets of rules is tedious and error prone, as is re-running the solver and checking for solutions. Thankfully, all this is automated by
The debugger takes a faulty CNF as input. Then it goes through the CNF line by line, disabling chunks at a time, reporting the sections that block solutions.
How does the debugger know what is a chunk of rules? And how does is report sections in a useful manner? This part requires effort from the programmer. But it is a minimal amount of effort and a best practice regardless. All the debugger requires is commented code. It will use the comments as both logical boundaries and human readable labels.
There is also a special case, where multiple rules might be subtly wrong but don't appear faulty when examined individually. While this is a more time-consuming case to solve, the debugger can check any combination of rules. (It will skip the multitude of combinations that are trivial and provide no useful information.) I had written out a lengthy collection of examples to elaborate on this, but in practice it happens so rarely to not be worth the space.
Get the stardust2.py sample program. Notice how every section of code is preceded with
cnf.comment("text")? These are the debugging comments. Scroll down to rule 10 and change
false('who', 'walter') to
Run the stardust program, and it'll report
No possible solutions. Now run the debugger across the generated CNF.
It will generate the following report:
links in triples 10. Christine met either Brian or ...
As expected, there are two labels reported to be flawed. One of these, "links in triples", does not appear as a comment in
stardust2.py. It is actually part of the automatically generated physics of a Zebra puzzle. (If you open the CNF, you'll find these physics make up 99% of the file.) The second correctly identifies the sabotaged section.
The debugger is meant to find high-level logic errors. There is another tool that is designed to isolate simpler typos. This tool is really just a mere field of the CNF object, called
auto_mode and can take one of three values:
rw wo ro
A requirement of automatic term generation is a database. This database is for mapping human-friendly term label strings to CNF-friendly term integers. But being a database, it has potential for more. One possibility is to test for the existence of a label. Brand new labels should only appear at certain times. If a new label is created at the wrong time, it is likely a typo.
"rw" stands for read-write. This is the default mode and provides no safeties.
"wo" stands for write-only. It should be used when creating new labels, but requires some care.
"ro" stands for read-only. 95% of the time, this is the mode to use.
Here I'll be illustrating a section of simple puzzle. The puzzle takes place on a 6x6 grid of true/false cells. There are several bugs which will be caught with the use of
# set up the library from sat import * cnf = CNF() f = cnf.auto_term # generate all the labels cnf.auto_mode = 'wo' for x,y in product([1,2,2,4,5,6],[1,2,3,4,6]): f(x,y) # define rules cnf.auto_mode = 'ro' cnf.comment('column 2 is false') for y in range(6): cnf.write_one(-f(2,y))
For now ignore the three obvious bugs and let's examine the intent.
In the "wo" section, all of the labels are generated. In this case an XY grid. The unusual part is that there is no
cnf.write() call in this section. There are only calls to the grid function,
f(). Thus, this section of code does not actually generate any CNF information. However it does populate the
auto_term database. The write-only setting means that every call to
f() should be something new and never before seen. If the label is found to already exist in the database, then something is wrong. And in fact, it catches the duplicated X coordinate and complains about
Error: existing label (2,1). Change the extra 2 to a 3 and it'll get past that section. One bug found.
Of course there is still a pretty obvious bug on that line, the missing 5 in the Y coordinates. That will be uncovered shortly, in the read-only section.
The next complaint is
Error: creating label (2,0). Here we've made an indexing error. The coordinates are all 1-indexed while
range() is zero indexed. Since there are no zeros in the database, a new entry would have to be created. Because it is in read-only mode, this is an error. Change it to
range(1,7) (or use an explicit list) and the problem goes away. Two bugs found.
The final error is
Error: creating label (2,5). Looking at the Y coordinates during the generation step, a 5 is missing and so those points are never created. Add a 5 to the Y list and the code is fully debugged.
Read-only mode will make most typos stand out like a sore thumb. Normally a typo would silently generate a fresh label with an extra bit of state in the solution space. This will lead to multiple superfluous solutions, so the
debugger.py script will be of no help. With read/write-only modes, you get an exact line number and label.
Despite being designed around declarative programming, Prolog has almost no facilities for debugging declarative programs. This is probably the largest perceived weakness of Prolog and the butt of the only joke most programmers know about Prolog.
I can't do the explanation proper justice, so please read The Prolog Lightbulb Joke to illustrate the more common issues.
While Prolog has more features and is capable of solving any problem that SAT could, debugging is one place where SAT seems more straightforward.