Pipes, Clojure and Choco (part 3: Smart Search)
UPDATED: I found a bug in my code, which made the smart search less efficient, so I have rewritten this post (more details below; at last I can delete the part that said “I do not understand”).
This is the third in a series of posts that explore constraint programming. It’s always more fun to learn with a good example, so we’re writing a solution for this problem.
In the previous article I described a “direct search”, but even there we were using constraints:

we only tried to add tiles that were “unused”;

we didn’t add tiles that would point pipes over an edge;

we didn’t add tiles that were mismatched with their neighbours.
It’s true I didn’t explain this in detail (if you’re interested, validmove?
in isti.tubes.puzzle is the main test), but these constraints
are so obvious it would have been “stupid” to ignore them.
How can we make this smarter? Norvig wrote an excellent article on constraints within Sudoku. Can we do something similar here?
Perhaps we can exploit something from knot theory? That would be interesting, but would require a complete rethink of how we describe the problem. A simpler alternative is to more carefully choose where we move.
Currently we add new tiles from the top left, raster scanning horizontally
(nextempty
in isti.tubes.direct). But it would make sense to
place any tiles we are certain of first, so that we don’t waste time using
them elsewhere. More generally, it seems like it would be best to place tiles
on the most constrained cells first.
We can do this by tracking which tiles are available for each empty cell. Doing this efficiently is quite challenging. Originally I used an implementation with three separate maps, which avoided any scans of the data, but had very complex logic for updates. Here I will show a simpler approach with a single map from location to a set of masks.
There is one additional wrinkle: because we place moves in a “random” way we may generate the same pattern more than once (in one branch we could move T1 to C1 then T2 to C2; in another T2 to C2 then T1 to C1). This is because the moves no longer have a fixed, exclusive order^{1}. We can avoid repeated moves by storing a hash of each tried position and testing against that  this must also go in the state.
OK, so on to the code. The state at each node will consist of the current
puzzle, the map from location to possible masks (options
), and the mutable
set of hashes (visited
, mutable so that we can always get the most complete
set, even when lazy sequences make the evaluation order unclear):
(defn extract [[puzzle options visited]]
puzzle)
(defn solve [puzzle trace]
(dfs
[puzzle (initialoptions puzzle) (atom #{})]
(comp complete? extract)
expand (atom 0) trace))
Expansion to child nodes (expand
) uses the options map described earlier 
sorted by the number of masks, then reduced to a sequence of locations.
(defn sortbyconstrained [options]
(sortby (comp count second) options))
(defn expand [[puzzle options visited]]
(expandmasks puzzle options visited
(first (sortbyconstrained options))))
where expandmasks
do the mechanical work of generating all the possible
moves (wrapped in lazy sequences so that we only pay for the children we use).
Embedded in that function, expandmasks
, (see full source) is
the logic for updating the state. We must do three things:

remove the entire entry for wherever we move (since that is now occupied there are no possible masks that can be placed in the cell);

if we used the last of any particular mask (eg the last horizontal pipe, or the last toptoright corner) then we must remove that mask from all other options (since none remain to be used elsewhere);

we must add the extra constraints imposed on (empty) neighbours by our move.
(defn updateoptions [puzzle options ij mask]
(let [tiles (:tiles puzzle)
cells (:cells puzzle)
options (removelocation options ij)
options (removelastmask options tiles mask)
options (updateneighbours options cells ij puzzle)]
options))
At this point I should explain my the reason for the “update” at the top of this article: originally the expansion into possible moves included all possible cells. If the mostconstrained cell failed to work, others were tried in turn. But that was wrong! If the mostconstrained cell has no consistent solution then our current solution is wrong and we can backtrack immediately. This makes the code much faster.
One more interesting snippet is an additional check: if we have a cell with no options (after the second of the three actions described just above) then we will be unable to find a solution, so we should prune the search tree at this point:
(defn checkoptions [puzzle options]
(= (expt (:n puzzle) 2) (+ (count (:cells puzzle)) (count options))))
And, as I said above, the full source and the entire repo are available for you to study if curious.
How does it perform? Interestingly! Here are some typical solutions:
1 2 3 4 5 6 7 8 9 10
▒═▒▒▒ ▒═▒▒▒ ╔═▒▒▒ ╔═▒▒▒ ╔═▒▒▒ ╔═▒▒▒ ╔═▒▒╗ ╔═▒▒╗ ╔═▒▒╗ ╔═▒▒╗
▒═▒▒▒ ▒═▒▒▒ ▒═▒▒▒ ▒═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒
▒╔▒╬▒ ▒╔▒╬▒ ▒╔▒╬▒ ║╔▒╬▒ ║╔▒╬▒ ║╔▒╬▒ ║╔▒╬▒ ║╔▒╬╝ ║╔═╬╝ ║╔╬╬╝
▒▒▒▒╗ ▒▒▒▒╗ ▒▒▒▒╗ ▒▒▒▒╗ ▒▒▒▒╗ ║▒▒▒╗ ║▒▒▒╗ ║▒▒▒╗ ║▒▒▒╗ ║▒▒▒╗
╚▒▒▒▒ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝ ╚▒▒▒╝
11 12 13 14 15 16 17 18 19 20
╔═▒▒╗ ╔═▒▒╗ ╔═▒▒╗ ╔═▒▒╗ ╔═▒▒╗ ╔═▒▒╗ ╔═▒╔╗ ╔═╗╔╗ ╔═╗╔╗ ╔═╗╔╗
╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═▒▒▒ ╠═╣▒▒ ╠═╣▒║
║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝ ║╔╬╬╝
║▒▒╚╗ ║▒▒╚╗ ║▒▒╚╗ ║║▒╚╗ ║║║╚╗ ║║║╚╗ ║║║╚╗ ║║║╚╗ ║║║╚╗ ║║║╚╗
╚▒▒▒╝ ╚▒▒═╝ ╚▒╚═╝ ╚▒╚═╝ ╚▒╚═╝ ╚╝╚═╝ ╚╝╚═╝ ╚╝╚═╝ ╚╝╚═╝ ╚╝╚═╝
21
╔═╗╔╗
╠═╣║║
║╔╬╬╝
║║║╚╗
╚╝╚═╝
1 2 ... 16 17 ... 34 35
▒▒▒▒▒▒ ▒▒▒▒▒▒ ╔╗▒╗╔╗ ╔╗▒╗╔╗ ╔╗▒╗╔╗ ╔╗╔╗╔╗
▒╠▒▒▒▒ ▒╠▒▒▒▒ ║╠╣▒▒║ ║╠╣▒║║ ║╠╣║║║ ║╠╣║║║
▒▒▒▒▒▒ ▒▒▒▒▒▒ ▒▒▒▒▒▒ ▒▒▒▒▒▒ ║║║║║║ ║║║║║║
▒▒▒▒▒▒ ▒▒▒▒▒▒ ▒▒▒▒▒▒ ▒▒▒▒▒▒ ║║║║║║ ║║║║║║
▒▒▒▒╣▒ ▒▒▒▒╣║ ▒▒▒▒╣║ ▒▒▒▒╣║ ║║║╠╣║ ║║║╠╣║
▒▒▒▒▒▒ ▒▒▒▒▒▒ ╚╝╚╝╚╝ ╚╝╚╝╚╝ ╚╝╚╝╚╝ ╚╝╚╝╚╝
The direct search in the previous article required around 40 moves to solve the first problem. The new, smarter approach uses half that, and only missteps slightly on step 9 (if you read the earlier version of this post then you may remember that this was the point where the search becaome “lost”; that no longer happens when we discard alternative cells).
For the second example, the new strategy is even better. The number of iterations is reduced by 90% (to the minimum) because the clue tiles provide strong constraints  every move is fixed.
Unfortunately, the extra calculations require more time per step. Here are the timing results:
examples concentric per iteration
5x5 6x6 8x8 10x10 average
smart 2ms/21 3ms/35 900ms/9,000 40s/300,000 100us
direct 0.2ms/40 2ms/300 200ms/30,000 10s/1,000,000 510us
This is a little disappointing. Even in the “perfect” second example, we are no faster than the simpler search. And the overhead continues to outweigh the advantage from the reduced number of steps on the concentric problems.
Again: you can see all the code in the repo.
I mentioned earlier that I had a more complex version of this code. That took 7ms for the 5x5 example and 16ms for 6x6, so . Next article I’ll show how to use Choco  will it do the impossible and combine smarts with speed?
Andrew
this (please comment if you do).

Sorry, but I don’t have a good, clear theoretical explanation for ↩
Related Posts
 Pipes, Clojure and Choco (part 2: Direct Search)
 Pipes, Clojure and Choco (part 5: Choco)
 Pipes, Clojure and Choco (part 1: Limits of Laziness)
 Pipes, Clojure and Choco  Optimisation
 An ORM for C?
 Pipes, Clojure and Choco (part 4: Benchmarking)
 CMake with Check Unit Tests
 How to Average Complex Responses
 Choco and Regular Expressions
 Javascript Graphics w/ Raphael