Pipes, Clojure and Choco (part 4: Benchmarking)
This is the fourth in a series of posts that explore constraint programming. I described a “direct search” in part 2, while part 3 extended that to included more “smarts”. But  as Scott Hunter here at ISTI pointed out yesterday  it has been difficult to draw accurate conclusions from the limited number of examples in the original problem.
(By the way, I just fixed a bug in the “smart” code, so if you read that article earlier you might want to check the updated version.)
So, provoked by Scott’s comments (which were by email  so I’ve now added commenting directly to the pages), I have had a look at automating the generation of test data.
My first idea was to generate puzzles like mazes. But puzzles don’t really look like mazes  they have more loops (mazes tends to be treelike).
So then I tried deleting pipes from a “fully connected” puzzle (or “grid”). This worked much better: it’s easy to select edges at random and it’s easy to check whether deleting the pipe will leave a gap (by “pipe” here I mean a single connection between two cells; a central cell can have up to four pipes  one to each neigbour  a cell in a valid puzzle will always have at least two).
And it turns out (after playing around) that puzzles with many deleted pipes tend to be less ambiguous. Which I suspect is better  the original problem was probably intended for puzzles with unique solutions.
So my final approach is:

start with a full “grid” of pipes;

choose a pipe at random, and delete it if it doesn’t leave a pipeless cell;

repeat until you’re not finding any more pipes to remove;

remove tiles at random from the puzzle until you are left with a suitable number of “clues”.
Here are some examples (the number of solutions is median/maximum, taken from a sample of 9):
╔╗╔╗ ╔▒▒▒ ╔══╗ ╔▒▒╗ ╔══╗ ▒▒═╗ ╔══╗ ▒▒▒▒
║║║║ ▒▒▒▒ ║╔╗║ ▒▒╗▒ ║╔╗║ ║▒▒▒ ╠═╗║ ╠▒╗▒
╠╝╚╣ ▒╝▒▒ ║╚╝║ ▒╚▒▒ ╠╝╚╣ ▒▒▒╣ ║╔╣║ ║▒╣║
╚══╝ ▒═▒▒ ╚══╝ ▒▒▒▒ ╚══╝ ▒══▒ ╚╝╚╝ ╚╝▒╝
4x4, 20% clues 4x4, 30% clues 4x4, 40% clues 4x4, 50% clues
1/2 solutions 1 solution 1 solution 1 solution
╔══╦╗ ▒▒▒▒╗ ╔╗╔═╗ ▒▒▒═▒ ╔═══╗ ╔══▒▒ ╔╗╔═╗ ╔╗▒═╗
╠══╝║ ▒▒▒▒▒ ║║╚═╣ ▒║▒▒▒ ║╔══╝ ▒▒▒═▒ ╚╝║╔╣ ╚▒▒▒╣
╚╗╔╗║ ╚╗╔▒▒ ╚╩═╗║ ▒▒▒▒║ ║╚═╦╗ ║▒═▒▒ ╔╦╝║║ ╔╦╝║▒
╔╣╠╝║ ▒▒▒▒▒ ╔══╝║ ╔═▒╝▒ ╠═╗║║ ▒═▒▒▒ ║╚═╝║ ║▒▒▒▒
╚╝╚═╝ ▒╝▒▒▒ ╚═══╝ ▒▒▒▒╝ ╚═╝╚╝ ▒═╝╚▒ ╚═══╝ ▒▒▒═▒
5x5, 20% clues 5x5, 30% clues 5x5, 40% clues 5x5, 50% clues
1/10 solutions 1/4 solutions 1/2 solutions 1 solution
╔╗╔══╗ ▒▒▒▒▒▒ ╔╗╔══╗ ▒▒╔▒═▒ ╔╗╔═╦╗ ╔▒▒▒▒▒ ╔══╗╔╗ ╔══╗╔╗
║║╚╗╔╝ ▒▒╚▒▒▒ ╚╝║╔═╝ ▒╝▒▒▒▒ ║╚╩╗╚╝ ║╚╩▒▒▒ ║╔═╝╚╝ ▒▒▒▒▒╝
║╠╗║╠╗ ▒╠╗║▒▒ ╔╗╠╝╔╗ ▒▒▒▒▒▒ ╚╗╔╝╔╗ ╚╗▒╝╔▒ ║╠╗╔╦╗ ▒╠▒╔╦╗
╚╝║║╚╝ ▒▒▒▒▒▒ ╚╩╝╔╣║ ╚▒▒╔▒▒ ╔╝╚═╣║ ▒▒╚═▒▒ ║║║║╚╝ ▒║▒▒╚▒
╔╗║║╔╗ ▒▒▒▒▒▒ ╔═╗╠╝║ ▒▒╗▒▒║ ╠═╦╗║║ ╠▒▒▒▒▒ ║╚╩╝╔╗ ▒▒▒╝╔╗
╚╝╚╝╚╝ ▒╝▒▒╚╝ ╚═╝╚═╝ ╚═▒▒═▒ ╚═╝╚╩╝ ▒═▒╚╩▒ ╚═══╩╝ ╚▒▒═▒▒
6x6, 20% clues 6x6, 30% clues 6x6, 40% clues 6x6, 50% clues
4/9 solutions 2/4 solutions 1/4 solutions 1/2 solutions
╔═════╗ ▒▒═▒▒▒▒ ╔═╗╔══╗ ╔═▒╔▒═▒ ╔═══╗╔╗ ╔▒▒▒╗╔
║╔═╗╔═╣ ▒╔▒▒▒▒▒ ╠═╝╚╗╔╝ ▒═▒▒╗▒▒ ║╔╗╔╝║║ ║▒▒╔╝║
║╚═╝╚╗║ ▒╚▒╝▒╗▒ ║╔══╣╠╗ ║╔▒═╣▒╗ ╚╝║║╔╣║ ▒▒▒▒▒╣
╚╗╔╗╔╩╝ ▒╗╔▒▒╩▒ ╚╝╔═╝╚╝ ╚▒▒═▒▒▒ ╔╗╚╝╚╝║ ▒╗╚╝╚▒
╔╣╚╝╚═╗ ╔▒▒▒▒▒▒ ╔╗║╔══╗ ▒▒▒╔▒═╗ ╚╩══╗╔╝ ╚╩▒═▒▒
╠╝╔╦╗╔╣ ▒▒▒╦╗╔▒ ║╚╝║╔═╣ ║▒▒▒╔═▒ ╔═══╝╚╗ ▒═══╝▒
╚═╩╝╚╩╝ ▒▒▒╝╚▒▒ ╚══╝╚═╝ ▒▒▒▒▒▒▒ ╚═════╝ ▒══▒═▒
7x7, 30% clues 7x7, 40% clues 7x7, 50% clues
6/104 solutions 1/15 solutions 1/2 solutions
╔═══╗╔═╗ ▒═▒▒╗▒═▒ ╔═╗╔═╗╔╗ ▒▒▒╔▒▒╔▒
╚╗╔╗║║╔╝ ╚╗▒▒║▒▒╝ ╠═╩╝╔╣╠╝ ╠═▒╝▒╣▒▒
╔╝╚╣║║╚╗ ▒╝▒▒▒▒▒╗ ╚═╦═╝║╚╗ ▒═╦▒╝▒╚▒
╚══╝╚╩═╝ ▒▒▒▒▒╩▒▒ ╔═╝╔╗╚═╣ ╔═▒▒╗╚═▒
╔══════╗ ▒══▒═▒═▒ ╠╗╔╝╚══╝ ▒▒▒╝▒▒═▒
║╔══╦══╝ ▒▒▒▒╦══╝ ╚╣║╔╦══╗ ╚▒║▒▒═▒╗
║║╔╗║╔╦╗ ║║╔▒▒▒▒╗ ╔╝╚╝║╔╗║ ╔╝╚╝║╔╗▒
╚╝╚╝╚╝╚╝ ▒▒╚▒╚╝▒▒ ╚═══╩╝╚╝ ╚▒══▒╝▒▒
8x8, 40% clues 8x8, 50% clues
4/70 solutions 1/3 solutions
╔╗╔═══╗╔╗ ▒╗▒═▒▒╗╔╗
║╚╩═╗╔╣║║ ▒▒╩═╗▒╣▒▒
╚══╦╝║║║║ ╚▒═▒▒║▒▒▒
╔═╗╚╗╚╣║║ ╔═▒╚╗╚╣║▒
╠╗╚═╣╔╝╠╝ ▒╗▒▒▒▒╝╠▒
╚╝╔═╝╚╗╚╗ ╚╝╔▒▒╚▒▒╗
╔═╝╔╦═╩═╝ ▒▒▒╔╦═▒▒╝
║╔╦╝╚═══╗ ║▒╦▒╚═══╗
╚╝╚═════╝ ╚▒▒▒═▒▒▒▒
9x9, 50% clues
2/11 solutions
And here are the average times for the two algorithms so far (direct time in ms, smart time in ms, the ratio smart/direct):
Clues  4x4  5x5  6x6  7x7  8x8  9x9 

20%  0.1, 0.5, 4x  0.7, 1, 2x  4, 4, 1x       
30%  0.1, 0.5, 4x  0.3, 1, 3x  2, 2, 1x  10, 6, 0.5x     
40%  0.1, 0.4, 5x  0.2, 0.4, 3x  0.6, 2, 3x  2, 3, 2x  7, 9, 1x   
50%  0.1, 0.3, 5x  0.1, 0.6, 4x  0.3, 1, 4x  0.7, 2, 3x  2, 4, 2x  17, 15, 1x 
Which, in the end, doesn’t change our conclusion: smart hardly ever outperforms the direct search.
As ever, latest code is in the repo.
Andrew
Related Posts
 Pipes, Clojure and Choco (part 5: Choco)
 Pipes, Clojure and Choco (part 3: Smart Search)
 Pipes, Clojure and Choco  Optimisation
 Pipes, Clojure and Choco (part 2: Direct Search)
 An ORM for C?
 How to Average Complex Responses
 Pipes, Clojure and Choco (part 1: Limits of Laziness)
 CMake with Check Unit Tests
 Javascript Graphics w/ Raphael
 Jekyll on Bitbucket