Generating Mini-Crosswords

Introduction

The New York Times publishes “mini-crosswords,” which are crossword puzzles on a relatively small grid (usually 5×5), without any black squares, so that every row and column of the grid must spell a word. The figure below shows an example of a solution to such a puzzle.

How hard is it to create these puzzles? This post is motivated by a recent College Mathematics Journal article (see Reference (1) below) that considers this question, and describes an approach using the Metropolis-Hastings algorithm to randomly sample instances of puzzles.

But instead of just randomly sampling one puzzle at a time, can we actually enumerate all possible puzzles? In particular, my idea was to reduce the problem of finding a crossword puzzle solution to that of finding a (generalized) exact cover with appropriately crafted constraints. This would be handy, because we already have code for solving exact cover problems, using Knuth’s Dancing Links (DLX) algorithm (see here and here for similar past exercises).

Crossword as an exact cover

To state the problem more precisely: given a positive integer n indicating the size of the grid (n=5 in the above example), and a dictionary W of m words each of length n over alphabet \Sigma, we must construct an exact cover problem whose solution corresponds to a placement of letters in all n^2 grid positions such that each row and column spells a word in W.

To do this, we construct a zero-one matrix with 2mn rows, each corresponding to placing one of the m dictionary words either “across” (in one of the n rows of the puzzle) or “down” (in one of the n columns). A solution will consist of a subset of 2n rows of the matrix: n “across” words, one in each row of the puzzle, and n “down” words, one in each column, each pair of which intersect in the appropriate common letter.

To represent the constraints, we initially need 2n^2\cdot|\Sigma| columns in our matrix (where |\Sigma|=26), each indexed by a tuple (puzzle row i, puzzle column j, alphabet letter c, across or down). For a given matrix row– representing placement of a word with letters (w_1, w_2, ..., w_n) in a particular location and orientation in the puzzle grid– we set to one those n\cdot|\Sigma| columns corresponding to the n different (i,j) grid locations where the word will be placed in the puzzle… where for each alphabet letter c, we set the “across” column to one if and only if either

  • the word is “across” and c matches the letter of the word in this location (i.e., c=w_j), or
  • the word is “down” and c does not match the letter of the word in this location (i.e., c \neq w_i).

If neither of these conditions is satisfied, we instead set the “down” column to one. The following Python code produces the number and list of (row, column) pairs of the resulting sparse matrix.

letters = 'abcdefghijklmnopqrstuvwxyz'
m = len(words)
n = len(words[0])
print(m * n * 2 * (n * 26), file=file)
b = [True, False]
for row, ((w, word), k, across) in enumerate(
            product(enumerate(words), range(n), b)):
    for col, (i, j, letter, horiz) in enumerate(
            product(range(n), range(n), letters, b)):
        if ((i if across else j) == k and
            (word[j if across else i] == letter) == (across == horiz)):
            print(row, col, file=file)

However, we’re not quite finished. Although the desired end result is a crossword with distinct words in each row and column, such as the 6×6 solution shown in (a) below, as Howard observes in the CMJ article, there are many valid solutions that use the same word more than once, including the extreme cases of symmetricword squares” such as the one shown in (b) below.

(a) 6×6 mini-crossword, where the word in each row and column is unique. (b) 6×6 “word square,” a symmetric grid with each row and corresonding column containing the same word.

We can eliminate this duplication by adding m “optional” columns to the zero-one matrix, one for each word in the dictionary, and solve the resulting generalized exact cover problem, so that each word may be used at most once in a solution.

Results

All of the source code is available here, as well as on GitHub. My initial test used the 4×4 case discussed in Howard’s paper, with his dictionary of 1826 words. He describes a process for estimating the total number of possible puzzles by repeated sampling using the Markov chain Monte Carlo approach: “We estimate that there are approximately
73,000–74,000 distinct puzzles each with no repeated words.” This is pretty accurate; it turns out that there are exactly 74,339 (each contributing two symmetric pairs of solutions to the generalized exact cover problem, for a total of 148,678 solutions).

References:

  1. Howard, C. Douglas, It’s Puzzling, College Mathematics Journal, 49(4) September 2018, p. 242-249 [DOI]
  2. Knuth, D., Dancing Links, Millenial Perspectives in Computer Science, 2000, p. 187-214 (arXiv)

 

Disabling subnormals in MATLAB

Suppose that we want to compute the following expression, somewhat contrived in complexity for the purpose of example:

y = \frac{1}{s}\sum\limits_{i=1}^{10^7} \frac{s}{2^{50}}

The following MATLAB code implements this formula, and measures the time required to evaluate it:

tic;
scale = 1;
y = 0;
for i = 1:10000000
    x = scale;
    for j = 1:50
        x = x * 0.5;
    end
    y = y + x;
end
y = y / scale;
toc

Now suppose that we execute this same code again, but this time changing the “scale” factor s to a much smaller value: scale = realmin, corresponding to s=2^{-1022}, the smallest positive normalized floating-point number. Inspection of the formula above suggests that the value of y should not depend on the changed value of s (as long as it is non-zero); we may spend most of our time working with much smaller numbers, but the end result should be the same.

And indeed, execution of the modified code confirms that we get exactly the same result… but it takes nearly 20 times longer to do so on my laptop than the original version with s=1. And there are more complex– and less contrived– calculations where the difference in performance is even greater.

The problem is that these smaller numbers are subnormal, small enough in magnitude to require a slightly different encoding than “normal” numbers which make up most of the range of representable floating-point numbers. Arithmetic operations can be significantly more expensive when required to recognize and manipulate this “special case” encoding.

Depending on your application, there may be several approaches to handling this problem:

  1. Rewrite your code to prevent encountering subnormals in the first place. In the above contrived example, this is easy to do: just shift the “scale,” or magnitude, of all values involved in the computation away from the subnormal range (and possibly shifting back only at the end if necessary). This can not only result in faster code, but more accurate results, since subnormal numbers have fewer “significant” mantissa bits in their representation.
  2. Disable subnormal numbers altogether, so that for any floating-point operation, input arguments or output results that would otherwise be treated as subnormal are instead “flushed” to zero.

We have seen above how to manage (1). For (2), the following MEX function does nothing but set the appropriate processor flags to disable subnormals. I have only tested this on Windows 7 with an Intel laptop, compiling in MATLAB R2017b with both Microsoft Visual Studio 2015 as well as the MinGW-w64 MATLAB Add-On (edit: a reader has also tried this on Linux Mint 19 with MATLAB R2018b and GCC 7.3.0):

#include <xmmintrin.h>
#include <pmmintrin.h>
#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    //_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
    _mm_setcsr((_mm_getcsr() & ~0x8000) | (0x8000));
    //_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
    _mm_setcsr((_mm_getcsr() & ~0x0040) | (0x0040));
}

After running this MEX function in a MATLAB session, re-running the modified calculation above gets all of the speed back… but now at a different cost: instead of the correct value y=10^7/2^{50}, every term in the sum has been flushed to zero, resulting in a final incorrect value of y=0.

If there is any moral to this story, it’s that you’re a test pilot. First, this was a very simple test setup; it’s an interesting question whether any MATLAB built-in functions might reset these flags back to the slower subnormal support, and whether it is feasible in your application to reset them back again, possibly repeatedly as needed. And second, even after any algorithm refactoring to minimize the introduction of subnormals, can your application afford the loss of accuracy resulting from flushing them to zero? MathWorks’ Cleve Moler seems to think the answer is always yes. I think the right answer is, it depends.