source: debugger.py
Pure declarative programs are generally very weird to debug. They don't have instructions, so there is nothing to step through. They don't have state, so there is no memory to examine. The solving engine for a declarative program (in our case, Minisat) will have instructions and state, but a highly optimized backend works with an intermediate form that is many steps removed from the input program.
The most frustrating case to debug occurs when there are no solutions. As a negative can't be proven, neither can a program that produces no output be debugged. The opposite case (too many solutions) is simple - compare any output to the rules at hand, find the rule it violates and tighten up the code for that section.
The case of no solution is usually caused by one overly restrictive rule. A straightforward way of finding which rule is blocking the solution would be to disable each rule one at a time. Disabling some rules will have no effect, and these can be eliminated from the search. Disabling the buggy rule will of course produce solutions. But it takes two to have a disagreement - the buggy rule will usually disagree with one (or more) other rules, and disabling these good rules will also produce solutions.
This is a lot of work all around. Disabling sets of rules is tedious and error prone, as is re-running the solver and checking for solutions. Thankfully, all this is automated by debugger.py
.
The debugger takes a faulty CNF as input. Then it goes through the CNF line by line, disabling chunks at a time, reporting the sections that block solutions.
How does the debugger know what is a chunk of rules? And how does is report sections in a useful manner? This part requires effort from the programmer. But it is a minimal amount of effort and a best practice regardless. All the debugger requires is commented code. It will use the comments as both logical boundaries and human readable labels.
There is also a special case, where multiple rules might be subtly wrong but don't appear faulty when examined individually. While this is a more time-consuming case to solve, the debugger can check any combination of rules. (It will skip the multitude of combinations that are trivial and provide no useful information.) I had written out a lengthy collection of examples to elaborate on this, but in practice it happens so rarely to not be worth the space.
Get the stardust2.py sample program. Notice how every section of code is preceded with cnf.comment("text")
? These are the debugging comments. Scroll down to rule 10 and change false('who', 'walter')
to true('who', 'walter')
.
Run the stardust program, and it'll report No possible solutions
. Now run the debugger across the generated CNF.
As expected, there are two labels reported to be flawed. One of these, "links in triples", does not appear as a comment in stardust2.py
. It is actually part of the automatically generated physics of a Zebra puzzle. (If you open the CNF, you'll find these physics make up 99% of the file.) The second correctly identifies the sabotaged section.
The debugger is meant to find high-level logic errors. There is another tool that is designed to isolate simpler typos. This tool is really just a mere field of the CNF object, called auto_mode
and can take one of three values: rw wo ro
A requirement of automatic term generation is a database. This database is for mapping human-friendly term label strings to CNF-friendly term integers. But being a database, it has potential for more. One possibility is to test for the existence of a label. Brand new labels should only appear at certain times. If a new label is created at the wrong time, it is likely a typo.
Here I'll be illustrating a section of simple puzzle. The puzzle takes place on a 6x6 grid of true/false cells. There are several bugs which will be caught with the use of auto_mode
.
# set up the library
from sat import *
cnf = CNF()
f = cnf.auto_term
# generate all the labels
cnf.auto_mode = 'wo'
for x,y in product([1,2,2,4,5,6],[1,2,3,4,6]):
f(x,y)
# define rules
cnf.auto_mode = 'ro'
cnf.comment('column 2 is false')
for y in range(6):
cnf.write_one(-f(2,y))
In the "wo" section, all of the labels are generated. In this case an XY grid. The unusual part is that there is no cnf.write()
call in this section. There are only calls to the grid function, f()
. Thus, this section of code does not actually generate any CNF information. However it does populate the auto_term
database. The write-only setting means that every call to f()
should be something new and never before seen. If the label is found to already exist in the database, then something is wrong. And in fact, it catches the duplicated X coordinate and complains about Error: existing label (2,1)
. Change the extra 2 to a 3 and it'll get past that section. One bug found.
Of course there is still a pretty obvious bug on that line, the missing 5 in the Y coordinates. That will be uncovered shortly, in the read-only section.
The next complaint is Error: creating label (2,0)
. Here we've made an indexing error. The coordinates are all 1-indexed while range()
is zero indexed. Since there are no zeros in the database, a new entry would have to be created. Because it is in read-only mode, this is an error. Change it to range(1,7)
(or use an explicit list) and the problem goes away. Two bugs found.
The final error is Error: creating label (2,5)
. Looking at the Y coordinates during the generation step, a 5 is missing and so those points are never created. Add a 5 to the Y list and the code is fully debugged.
Read-only mode will make most typos stand out like a sore thumb. Normally a typo would silently generate a fresh label with an extra bit of state in the solution space. This will lead to multiple superfluous solutions, so the debugger.py
script will be of no help. With read/write-only modes, you get an exact line number and label.
Despite being designed around declarative programming, Prolog has almost no facilities for debugging declarative programs. This is probably the largest perceived weakness of Prolog and the butt of the only joke most programmers know about Prolog.
I can't do the explanation proper justice, so please read The Prolog Lightbulb Joke to illustrate the more common issues.
While Prolog has more features and is capable of solving any problem that SAT could, debugging is one place where SAT seems more straightforward.