Sudoku in Differential Dataflow

Sudoku is a class of puzzles in which there is a 9x9 grid that needs to be populated with the numbers 1 through 9, each appearing exactly 9 times. There are rules about the placements, specifically that each row, column, and the most obvious nine 3x3 boxes need to contain exactly one of each number. Starting from some given numbers at certain locations, the puzzle is then to determine the locations of all of the other numbers.

In a series of mediocre videos , I talked through coding up the Sudoku problem in differential dataflow. We encoded the problem, described some inferences rules to make progress, and then saw what happened as we incrementally removed given input numbers. This post is a distillation of that series, if you'd rather just read about things than spend a substantially longer amount of time watching me repeatedly try to figure out what is currently wrong.

Much of this work was revitalized for me due to the interest around "Miracle Sudoku" . I actually went and worked with that a bit before coming around to the exposition below, but explaining things that way seemed needlessly confusing. Instead, I recommend checking out the link and trying out the puzzle (what we do here will work for it, mutatis mutandis ).

A Sudoku board

This is the sort of thing we have to deal with.

bmq6zeR.png!web

The idea is that the empty spots need to have numbers put in, and there is just one way to do it that respects the constraints on rows, columns, and boxes.

Encoding the Sudoku puzzle

Initially any of the nine values can be assigned to any of the nine times nine board locations. As soon as we get some constraints that number starts to go down, but before that happens it leads us to our representation of the Sudoku board state: a collection of (val, row, col) triples.

For example, the (1,1) position on the board (upper left, for this post) is initially empty, and may have nine possible values assigned to it. Written as (val, row, col) triples, they would be:

(1, 1, 1)
(2, 1, 1)
(3, 1, 1)
(4, 1, 1)
(5, 1, 1)
(6, 1, 1)
(7, 1, 1)
(8, 1, 1)
(9, 1, 1)

There are some 80 other locations each with nine possibilities; we aren't going to list them out, but you could imagine a collection of nine times nine times nine triples (it's 729).

The "given" inputs reduce down the possibilities. While many locations may have nine possibilites, some locations will be immediately reduced down to just one possible value. For example, to specify that the number five should appear in row three and column one, as it does in the example above, we could list only that possibility, (5, 3, 1) , instead of all nine.

In any case, our board state will be a collection of triples:

// Contains possible (val, row, col) triples.
let board: Collection<G, (i8, i8, i8)>;

We will be able to add triples to and remove triples from this collection, as is common for differential dataflow, and any computation that derives from board will be correctly maintained. All we need to do now is describe some computation!

Implementing inference rules

We know of the three main constraints on a Sudoku board, but we want to try and translate these in to inference rules. Differential dataflow is not Prolog, which is a pretty great fit for solving things like Sudoku; instead we'll need to specify some pro-active computation that implement the inference rules we have in our head.

In particular, we'll write some logic that goes from board_prev to board_next , ideally reducing the set of candidate triples, somehow. If we can come up with some logic like that, we can repeatedly apply the logic until it reaches a fixed point, which will be the limits of our inferencing ability for that particular input. Ideally there will be relatively few possible values for each location, maybe only one!

That is, we are planning on writing a fragment that looks like

let board_final =
board
    .iterate(|board_prev| {
        // transformations that result in `board_next`
        unimplemented!("not written yet")
    });

We still have to fill in the unimplemented part, naturally. But at least we have a sense for the shape of the logic we need to write: a function from a collection of prior possibilities for (val, row, col) to a new and ideally reduced collection of possible triples.

We'll also finish off the computation with some reporting! What I used was a fragment that counts the number of value at each location, and then reports how many of each count it saw (there should be 81 total across the board):

// Report the number of locations with each number of candidate values
board_final
    .map(|(_val, row, col)| (row, col))
    .count()
    .map(|(_row_col, count)| count)
    .consolidate()
    .inspect(|x| println!("Final counts: {:?}", x));

For example, if we just start this now with an empty iteration rule, we reach fixed point right away and report:

mcsherry@ameisenigel sudoku % cargo run --release
    Finished release [optimized] target(s) in 0.02s
     Running `target/release/sudoku`
Final counts: (1, 0, 24)
Final counts: (9, 0, 57)
1.140586ms      Round 0 complete
mcsherry@ameisenigel sudoku %

What does this mean? Well, we are reporting triples (data, time, diff) for the numbers of values at locations. Here the data are possible numbers of values (either 1 or 9 it seems), and diff is the number of times we see the value in the collection. Apparently there is one value in 24 locations, and nine values in 57 locations. There are 24 givens for this problem, and nothing is known about any other locations, so this checks out.

What we need now are some good inference rules to apply!

What are some good inference rules?

Well, one way to think about this is "how do you normally figure things out in Sudoku?". We could certainly start there, but we'll end up going well beyond what people normally think about. Let's start by shouting out rules.

RULE: Determined locations rule out other possibilities!

If we have a determined location, say (5, 3, 1) as in the example, we can be certain that the value 5 does not occur anywhere else in row 3 , nor anywhere else in column 1 . In addition, the entire upper right three x three box cannot have a 5 because you only get one of each number in each 3x3 box. So, determined numbers rule out possibilities in other locations.

Let's start by writing some code that starts from board and identifies triples that correspond to determined locations: those triples that are the only instance of their row and col in the collection.

// Key each triple by its row and column.
let keyed = board.map(move |(v,r,c)| ((r,c),v));
// Retain those `(r,c)` pairs with exactly one representative.
let unique_keys = keyed
    .map(|(key, _)| key)
    .threshold(|_key, count| if count == &1 { 1 } else { 0 });

// Retain triples with unique representative row and columns.
let unique_triples =
keyed
    .semijoin(&unique_keys)
    .map(|((r,c), v)| (v,r,c))

The unique_triples collection should contain exactly those (val, row, col) triples whose row and col occur in no other records. That mean that this value does occur at this location, as there are no other choices. If we were playing the pen and paper version of the game, we would write in the number at this point! In pen, because we are that certain!

Instead, we are going to write some rules that cross out other locations. In particular, we can cross out any locations in the same row, column, or 3x3 box. To do that, I figure we'll write a little helper function that takes a (val, row, col) triple and enumerates the candidate triples that can no longer be.

// Enumerates unacceptable candidates if the input is a determined location.
fn excluded((val, row, col): (i8, i8, i8)) -> impl Iterator<Item=(i8, i8, i8)> {
    let rows = (1 .. 10).map(|r| (val, r, col));
    let cols = (1 .. 10).map(|c| (val, row, c));
    // Awkwardly, starting at 1 means some funny logic.
    let boxs = (1 .. 4).flat_map(|r| (1 .. 4).map(|c| (
        val,
        r + (row - 1) - (row - 1) % 3,
        c + (col - 1) - (col - 1) % 3,
    )));
    // Return all exclusions, except not the entry itself.
    rows.chain(cols).chain(boxs).filter(|vrc| vrc != &(val,row,col))
}

At this point, we can delete any candidates that are excluded by a determined location!

Let's write that down. This is easiest to do by describing the thing to negate, negating it, then adding in the positive elements.

let board_next =
unique_triples
    .flat_map(|vrc| excluded(vrc))
    .distinct()
    .negate()
    .concat(board);

We've actually missed several excluded (val, row, col) triples in the excluded logic above! Can you spot which ones? You have until we get to the part of the post where we actually need to use the improved logic!

This rule does a pretty decent job of resolving many Sudoku boards. For example, the sqlite manual pages for WITH RECURSIVE use recursion to solve this Sudoku board:

53..7....
    6..195...
    .98....6.
    8...6...3
    4..8.3..1
    7...2...6
    .6....28.
    ...419..5
    ....8..79

If we feed this input in (with all nine possibilities for the . locations), we get the complete answer.

...
DETERMINED: ((7, 2, 2), (0, 9), 1)
DETERMINED: ((4, 9, 2), (0, 10), 1)
DETERMINED: ((9, 4, 3), (0, 10), 1)
Final counts: (1, 0, 81)
5.90633ms       Round 0 complete

This is the tail end of a read-out, but it is telling us that over the course of ten rounds (starting at round zero, with the given inputs) we eventually populate all 81 locations. The last two happen to be (4, 9, 2) and (9, 4, 3) . The Final counts report tells us that 81 locations have contain exactly one candidate value.

So that's nice. The sqlite implementation uses this example to demonstrate guessing at locations ("bifurcation", in Sudoku-speak), the problem isn't actually complicated enough to require it.

So let's try out a "hard" Sudoku from Cracking the Cryptic , whose board we saw up above:

YJBNFfz.png!web

When we spin up our sophisticated computation, we see

Final counts: (1, 0, 26)
Final counts: (2, 0, 11)
Final counts: (3, 0, 13)
Final counts: (4, 0, 16)
Final counts: (5, 0, 13)
Final counts: (6, 0, 1)
Final counts: (9, 0, 1)
19.295849ms     Round 0 complete

It's not immediately obvious, but the conclusion is that 24 values were supplied as inputs, and we determined two additional (val, row, col) entries. They happen to be:

DETERMINED: ((9, 6, 7), (0, 1), 1)
DETERMINED: ((9, 9, 9), (0, 1), 1)

These locations don't really pop out at you, but check them out. All values except nines are ruled out.

For 26 locations we nailed the right value; for 11 locations we got it down to two choices, .. and one location still had all nine possibilities. We will need to improve our inferencing if we want to solve hard Sudoku problems!

For comparison, sqlite took 19.2 seconds to solve this problem with its recursive query (100x longer than the 170ms for the easier one). In about 1/1000 the time, we managed to not solve it, which sure is fast.

RULE: Rows, columns, and boxes must contain values!

Imagine in a row, column, or box there is only one possible location for a value. That location may still have multiple candidate values, but you can cross them off and pick that one value you are certain to need in that one location.

For example, in our picture we can see that there is only one possible location for one in the left-most box: right in the middle, as the left and right columns are ruled out by other ones. Because we found a value and a box with only one location, we can lock in that value at that location.

This is actually similar to the logic we used above for determined locations, where we looked for row and col pairs with only one occurrence of (???, row, col) in the collection. We are just using different fields this time. Specifically, "value and row" or "value and column" or "value and box".

Each of these constraints define a "key" associated with each triple, and we are interested in keys that have exactly one representative. In the first two cases, those keys are (val, row) and (val, col) respectively, and for boxes it is (val, box) where box identifies the box (row, col) live in.

Rather than hand-write each of these, let's generalize the logic up above to allow the arbitrary specification of a key associated with a (val, row, col) triple:

use differential_dataflow::{ExchangeData, hashable::Hashable};
fn keep_unique_by<G: Scope, F, D>(
    board: &Collection<G, (i8, i8, i8)>,
    logic: F,
) -> Collection<G, (i8, i8, i8)>
where
    G::Timestamp: Lattice+Ord,
    F: Fn(&(i8, i8, i8)) -> D + 'static,
    D: ExchangeData+Hashable,
{
    let keyed = board.map(move |trip| (logic(&trip), trip));
    let unique_keys = keyed
        .map(|(key, _)| key)
        .threshold(|_key, count| if count == &1 { 1 } else { 0 });

    keyed
        .semijoin(&unique_keys)
        .map(|(_key, trip)| trip)
}

This is certainly more of a mouthful, but the structure is identical to the logic above. We take in a collection board of possible candidates, and for each prefix it with some key through the application of logic . We track down which keys are uniquely represented, and retain the (val, row, col) representative for those keys.

This lets us re-write the unique_triples logic up above as

let unique_triples = keep_unique_by(&board, |&(_v,r,c)| (r,c));

We can also quickly jot off the other three rules, about binding unique representatives for rows, columns, and boxes, as

let row_needed = keep_unique_by(&board, |&(v,r,_c)| (v,r));
let col_needed = keep_unique_by(&board, |&(v,_r,c)| (v,c));
let box_needed = keep_unique_by(&board, |&(v,r,c)| {
    let row_off = 1 + 3 * ((r - 1)/3);
    let col_off = 1 + 3 * ((c - 1)/3);
    (v, (row_off, col_off))
});

These locations are also determined, and so we can cross off anything they exclude!

Now is the point by which you needed to figure out the limitation of excluded .

It turns out this doesn't work great, because we forgot something really important in excluded . A triple (val, row, col) excludes val from all sorts of rows, columns, and boxes, but it also excludes all other values from that (row, col) position. This is what we need to add in for the rules above to have any effect: they are primarily about ruling out other values at one location, rather than one value from other locations.

On the harder puzzle, the one were we figured out only two additional values, we make a bit more progress than before. But just a bit.

...
DETERMINED: ((8, 8, 4), (0, 0), 1)
DETERMINED: ((1, 5, 2), (0, 1), 1)
DETERMINED: ((9, 6, 7), (0, 1), 1)
DETERMINED: ((9, 9, 9), (0, 1), 1)
DETERMINED: ((5, 7, 2), (0, 2), 1)
Final counts: (1, 0, 28)
Final counts: (2, 0, 11)
Final counts: (3, 0, 13)
Final counts: (4, 0, 17)
Final counts: (5, 0, 11)
Final counts: (8, 0, 1)
21.02047ms      Round 0 complete

We discovered two more values,

DETERMINED: ((1, 5, 2), (0, 1), 1)
DETERMINED: ((5, 7, 2), (0, 2), 1)

which represents some progress, just nothing amazing yet.

Before racing ahead, double check these inferences. We've already talked through why 1 needs to appear in the center of the left box (it is ruled out of the box's left and right columns), but why does 5 need to appear in row seven, column one? It's one of these three new rules...

RULE: Exclusions happen even with uncertainty!

If we check out our hard puzzle, I've filled in our known locations, but I've also highlighted two locations that an experienced Sudoku artist would tell you each must contain a four, as no other locations can be four in that box.

Ivq2me3.png!web

Now, we don't know which is the four, but even without knowing which is which we can learn a thing. We don't know which location is which, but both configurations rule out fours from the rest of the row.

The more general form of the rule is that in any row, column, or box, if all of the possible locations for a value agree on excluding some triple, then that triple should be excluded, even without determining the location of the value. There is actually a fourth instance of this rule, that if all possible values in one location exclude some triple, we can exclude the triple without actually knowing the value at the location. This is what gives us our standard exclusion rules for determined locations.

Let's write down the code:

use differential_dataflow::{ExchangeData, hashable::Hashable};
fn exclude_by<G: Scope, F, D>(
    board: &Collection<G, (i8, i8, i8)>,
    logic: F,
) -> Collection<G, (i8, i8, i8)>
where
    G::Timestamp: Lattice+Ord,
    F: Fn(&(i8, i8, i8)) -> D + 'static,
    D: ExchangeData+Hashable,
{
    board
        .map(move |vrc| (logic(&vrc), vrc))
        .reduce(|_key, values, output| {
            let mut exclusions: HashSet<(i8, i8, i8)> = excluded(*values[0].0);
            for i in 1 .. values.len() {
                exclusions = exclusions.intersection(&excluded(*values[i].0)).cloned().collect();
            }
            for vrc in exclusions {
                output.push((vrc, 1));
            }
        })
        .map(|(_key, vrc)| vrc)
        .distinct()
}

As described, we collect all the candidates by some key function logic , and then intersect their constraints. Anything common to all of the candidates gets produced, and then excluded from board_next .

We apply this rule with four classes,

let row_col_exclude = exclude_by(&board, |&(_v,r,c)| (r,c));
let val_row_exclude = exclude_by(&board, |&(v,r,_c)| (v,r));
let val_col_exclude = exclude_by(&board, |&(v,_r,c)| (v,c));
let val_box_exclude = exclude_by(&board, |&(v,r,c)| {
    let row_off = 1 + 3 * ((r - 1)/3);
    let col_off = 1 + 3 * ((c - 1)/3);
    (v, (row_off, col_off))
});

and what happens?

...
DETERMINED: ((7, 9, 8), (0, 21), 1)
DETERMINED: ((8, 9, 1), (0, 21), 1)
DETERMINED: ((3, 5, 4), (0, 22), 1)
DETERMINED: ((7, 5, 5), (0, 22), 1)
Final counts: (1, 0, 81)
44.715396ms     Round 0 complete

This has solved us the puzzle! Nice work us!

The first conclusions other that what we had seen before are

...
DETERMINED: ((4, 7, 8), (0, 3), 1)
DETERMINED: ((4, 5, 9), (0, 4), 1)
DETERMINED: ((5, 5, 8), (0, 5), 1)
...

and specifically the first thing we determine is that four must be in row seven, column eight.

Take a moment and try and figure out why that must be the case. It's not obvious, at least it was not obvious to me.

Ok, so the four-five pair we discussed before? It rules out all the fours in that row (row five), which means that in the left-most box the fours can only be in the third column. We don't know where in that column, but all choices rule out the four from the seventh row, third column. That means that the only remaining position in the seventh row for a four is column eight.

That's what you need to unblock the puzzle.

You could totally have figured that out, but it is neat that the computer can do it too!

Now that we've done this, take a swing at the puzzle for yourself. These rules are sufficient to finish out the puzzle, and you could apply them too. Or, run the code and have a swing at figuring out which rule fired in each case. Looking at the output, it takes 22 sequential rounds of inference to solve the problem, each relying only on information from prior rounds.

Miracle Sudoku

As a parting though, I wanted to show off the interesting world of "Miracle Sudoku" . This is a variant on Sudoku where there are some additional constraints, and far fewer givens. The additional constraints are that any number, in addition to rows, columns, and boxes, also excludes itself from any location that can be reached in a king or knights move, as in chess, and it excludes the values above and below it from any orthogonally adjacent location.

Let's do a diagram (made using the web app associated with the link above ; check out the video, and the puzzle!).

MR3memj.png!web

In this example we have two givens, the one and the two. Around the two I've drawn in the excluded locations for the value two, using blue for the standard Sudoku rules and green for the extended "king and knight's moves", and the excluded locations for the values one and three in red. Although there are very few givens, if you also fill in where two is excluded around the given one, you start to see that you might get somewhere.

I strongly recommend that you give the puzzle a go! We'll show how to automatically unwind it in just a moment, but it is quite pleasant to try it out for yourself at first.

Updating excluded

This is easy enough for this puzzle: we'll just update the excluded rules to also exclude (val, row, col) triples that are a king or knight's move away, and those values that are within one from the locations to the left, right, above, and below.

This ends up being sufficient to solve the puzzle, using our set of rules. However, what makes the puzzle pretty interesting to solve is how things get locked in.

You might imagine that we lock in a bunch of ones and twos early on, and that is entirely correct. We don't get them all, but we do spend four rounds getting most of them locked in (and the remaining pair of each isolated to just two potential locations).

2umUBji.png!web

This is where the un-enterprising human might get stuck. If you haven't, this is a great time to try and make some progress and see what the next determined location is. It turns out it is pretty neat.

...
DETERMINED: ((1, 8, 2), (0, 4), 1)
DETERMINED: ((2, 2, 2), (0, 4), 1)
DETERMINED: ((3, 4, 2), (0, 8), 1)
DETERMINED: ((3, 5, 8), (0, 8), 1)
DETERMINED: ((3, 7, 1), (0, 8), 1)
...

We go four rounds from when we determine our last 1 or 2 before we determine another location.

I've colored purple the locations that a 3 can occupy. You might see some things immediately, like that the left-most box must have a 3 in its central column, which removes an option from the lower-left box. That fizzles out pretty quick. More interestingly, we can conclude that, for example, the upper-left location in the upper box cannot be a 3 , as it clashes with all locations in the upper-left box.

IZJfAzy.png!web

Rounds (0, 5) , (0, 6) , and (0, 7) appear to result in no new derivations (and indeed no values are locked in). Instead, what is going on is that we are thinning out the possible places in which threes can exist, and as they thin out the intersections of their excluded sets increase, slightly. This cascades, and after a few rounds enough have been ruled out that we can nail down the location of the first threes (three of them, in fact).

This continues, somewhat more briskly as the board starts to fill in, and we eventually complete the puzzle:

...
DETERMINED: ((8, 2, 8), (0, 18), 1)
DETERMINED: ((8, 5, 7), (0, 18), 1)
DETERMINED: ((8, 7, 9), (0, 18), 1)
DETERMINED: ((9, 7, 7), (0, 18), 1)
Final counts: (1, 0, 81)
74.886933ms     Round 0 complete

Doing the solve by hand is very pleasant, and if you now want to try it out there is another Miracle Sudoku puzzle that you can try from scratch. It turns out the program nails that one too, but I'm not going to spoil it for you.

Incremental updates to the input

If you've been confused this whole post as to why things like (0, x) get called "round x ", it is because the test harness is actually incrementally updating our input. We start from an empty input board, and in round 0 we introduce a big pile of options for values at each location.

Of course, we don't do any further updates to the problem, although we could.

We could take that easy Sudoku problem and remove any redundant inputs (ones whose removal does not prevent us from solving the problem). I did that and ended up with this problem, which still has a unique solution we find quickly.

.3.......
...1.5...
.98....6.
....6...3
4..8.3..1
7...2....
.6....28.
....19..5
....8..79

This takes the sqlite running time up 40x to seven seconds.

Perhaps more interestingly, there are puzzles like this one which allow you to put any number from 1 to 7 in the grey square, and you get a puzzle who's difficulty increases with the number. Pretty neat!

JfiEfaq.png!web

We can incrementally solve this puzzle (or attempt to) by making as much progress as we can with no clue, and then incrementally update the allowed values for the central location.

If we do that, we can determine three additional values,

qqMVBnb.png!web

and we rule out a great many other potential values at various locations.

Final counts: (1, 0, 31)
Final counts: (2, 0, 8)
Final counts: (3, 0, 16)
Final counts: (4, 0, 14)
Final counts: (5, 0, 9)
Final counts: (6, 0, 2)
Final counts: (9, 0, 1)
15.794631ms     Round 0 complete

Although only 31 locations are locked in, most of the others are greatly reduced in their possible values.

Actually, we can do a bit better. We are going to add in to the grey location a number between one and seven, which means not eight and not nine. We can remove those two values, (8, 5, 5) and (9, 5, 5) , and see what happens.

Final counts: (1, 1, 2)
Final counts: (2, 1, 1)
Final counts: (3, 1, -1)
Final counts: (5, 1, -2)
Final counts: (7, 1, 1)
Final counts: (9, 1, -1)
18.845005ms     Round 1 complete

Notice that the time coordinate (the second number in each tuple) is now a 1 . Also notice that we discovered and few more locked-down locations! Two other 8 values, as it turns out.

Let's add in the (1, 5, 5) constraint, by removing all other possible values from the (5,5) location. This increments the outer timestamp, and our times will now have a 2 in place of a 1 .

Final counts: (1, 2, 48)
Final counts: (2, 2, -9)
Final counts: (3, 2, -15)
Final counts: (4, 2, -14)
Final counts: (5, 2, -7)
Final counts: (6, 2, -2)
Final counts: (7, 2, -1)
31.363509ms     Round 2 complete

You can double check the math, but the differences here indicate that we've converted all ambiguous locations in to uniquely determined locations. Good news for us!

How does it continue? Well, when we swap (1, 5, 5) for a (2, 5, 5) we get this exciting output:

51.704378ms     Round 3 complete

You should read this as "nothing in the final counts changed". We still solve the puzzle, although the way in which we solved the puzzle has almost surely changed (that's where the 20ms went).

Swapping (2, 5, 5) for (3, 5, 5) works similarly:

77.130567ms     Round 4 complete

Things get a bit more interesting at 4 , where we start to break down.

Final counts: (1, 4, -44)
Final counts: (2, 4, 11)
Final counts: (3, 4, 14)
Final counts: (4, 4, 15)
Final counts: (5, 4, 3)
Final counts: (6, 4, 1)
99.170161ms     Round 4 complete

The results are no longer uniquely determined, which means we didn't solve the puzzle! That's embarassing!

It turns out we need new inference rules to continue, specifically that if in a region there are k values that can exist in only k distinct locations, then those values do not appear elsewhere in the region and other values do not appear in those locations.

Specifically, if we check out the 4 case (numbers in the corner indicate a "need to be in one of these locations" constraints):

ZBZrEn.png!web

We can see that in the bottom box, 5 and 6 can only appear in the top and middle positions. That locks them in to those locations, and the puzzle could, in principle, continue. In fact, if you go and change the input to put only 5 and 6 at these locations, the puzzle then solves.

But things break with each of 5 , 6 , and 7 . Actually, 5 is a bit better off than 4 , but things then devolve.

With the center a 5 :

Final counts: (1, 6, 6)
Final counts: (2, 6, 3)
Final counts: (3, 6, -3)
Final counts: (4, 6, -4)
Final counts: (5, 6, -1)
Final counts: (6, 6, -1)
103.175125ms    Round 6 complete

With the center a 6 :

Final counts: (1, 7, -3)
Final counts: (2, 7, -4)
Final counts: (3, 7, 6)
Final counts: (4, 7, -1)
Final counts: (6, 7, 2)
111.94169ms     Round 7 complete

With the center a 7 :

Final counts: (1, 8, -4)
Final counts: (2, 8, 4)
Final counts: (3, 8, -6)
Final counts: (4, 8, 5)
Final counts: (5, 8, 2)
Final counts: (6, 8, -1)
117.945062ms    Round 8 complete

The solve for 7 appears to involve a bit of speculation, and we could certainly head in that direction. The sqlite example query uses recursion to speculate in each location in turn, guessing each of the nine values. We've slimmed down the candidates quite a bit, and could plausibly be more efficient by having done so.

We could also imagine building out the more advanced local rules, like for the k values in k locations.

And most ambitiously, we could start to think through the inference rules that systems like prolog use to explore their constraint space. That seems like a great thing to think about and understand, more than to actually go and do.

Conclusions

Sudoku is a fairly relaxing combinatorial puzzle, at least for me. I certainly enjoyed the act of encoding inference rules in code, and trying to understand what I had failed to express when a version of the program didn't see what I saw. But also, I was very surprised at how much more the program saw than I did. It wouldn't be hard to imagine helpful incremental programs that work with thoughtful humans, and which maintain the hard-to-internalize combinatorial constraints.

That's a bit of a reach from Sudoku puzzles, though. I hope, if nothing else, that they've brought you some pleasure. And if so, go check out Cracking the Cryptic again and throw some love their way.

Sudoku in Differential Dataflow