Ruzzle Solver Algorithm

With the success of Ruzzle Solver, a lot of users have asked me to publish more informations about it. Some of them were interested to know more about it for an university exam. In this post, I will talk about some methods to realize this type of algorithms. Surely, in the future, I will talk more about its implementation.

The following tests have been executed with algorithms written in C, and with a MacBook Pro with an Intel Core i5 at 2.3 Ghz; with other implementation the results may obviously change, but I expect that the ratios between them will be the same.


In a dictionary (for example the Italian one) there are a fixed number of 600.000 words. Then, we have a fixed matrix of 4×4 cells. Surely, we are talking about an O(1) algorithm, but not all the ways to solve this problem are the same, the theory and the practice aren’t the same thing.

Full Enumeration

Assume that we load the set of words into an hashmap. We know that this type of structure has a lookup that, in the average case, requires a time of O(1). However, we have a static dictionary, we don’t need to change it once loaded. We can assume that our hashmap is an array that contains all the words, and that the hash function is perfect (i.e. injective); in this way we can have always a time complexity of O(1). The latter data structure is known in literature with the name: Direct Access Table.

The first algorithm that we can think can search all combination in the matrix, and for each of them we can look if it is in the dictionary or not.

We want to estimate the number of lookups at the hashmap. We assume that we have a matrix with distinct letters: we don’t count the words that can be drawn in different ways (this is not much important).

The algorithm starts choosing a cell, one of the 16 in the matrix; this one is the first letter of the word that we are drawing. Now, the algorithm needs to choose one of the adjacent cell to continue. Each central cell has 8 neighbors, the cells in the corners have 3 neighbors, and the ones in the edges have 5 of them. The average is (8*4+5*8+3*4)/16 = 5,25. The possible words of two letters are 15 * 5,25 = 84. From the second step, each neighbor has one other neighbor that we have visited, leading our average to 4,25. All the possible words of three letters that we can draw are 16*5,25*4,25 = 357. Obviously, the higher is the number of the steps and the smaller is the average of the number of neighbors (due to the visited ones that decrease). With this reasoning, the number of possible words that have at most seven letters exceeds one million.

We have seen that do an analytic estimation is complex, because it hard to count the real number of visited neighbor. However, with a simple recursive algorithm, we can count the exact number of combinations of the matrix: 12.029.624.

With the picture we can see the trend of the number of possible words in a matrix (with all distinct letters). Each bar shows the number of possible words that have at most N letters. In other words, it shows the search tree size (up to the level N) of the recursive algorithm that searches all the possible words in the matrix. We can see that the grow is exponential, and that this grow decreases his speed, because the more N is bigger and the more is the number of visited neighbor. Con il grafico possiamo osservare l’andamento esatto del numero di parole possibili in una griglia di lettere distinte tra loro. Ogni barra del grafico illustra il numero di parole con al più N lettere, ovvero la grandezza dell’albero di ricerca (fino al livello N) dell’algoritmo ricorsivo che calcola tutte le possibili parole presenti nella griglia. Possiamo osservare come questo cresce in maniera esponenziale e poi come tale crescita rallenta a causa del numero di vicini che diventano sempre più già visitati.

Hence, this algorithm needs 12 millions of lookups to the dictionary. The simpler algorithm that only counts the number of combinations, without calls any lookup, takes 2.5 seconds.

Active Dictionary

Looking at the higher number of combination that we calculated in the previous algorithm, we can change the role of the dictionary: from passive to active.

Assume that we have our dictionary, and that we read the words one by one. For each of them we only verify if it is in the matrix or not. This reasoning is correct, because the number of existing words is smaller that the number of words (even they not exist) in the matrix.

This verification is fast, the word helps to exclude the wrong path. Furthermore, the operations are simpler, we only check the letters of the word with the ones in the matrix. We don’t need to have an hashmap, we can read our dictionary file sequentially.

In fact, in this case, the average time to find all the words that are in the matrix is 150 milliseconds.

Branch and Bound

Looking at the first algorithm we can observe something. The lookup for a word “clev” fails, but we must continue the research from this word, for example “clever” is instead in the dictionary. To speed up the research, would be useful to know from the first letters that the word is not a prefix of any word in the dictionary. For example, having “xhct” we don’t need to continue to search longer words, we will not find anything.

Hence, to do this, we need a clever “hashmap”, where we can do lookups of prefixs. This type of data structures exists and is widely used, for example in information retrieval, in the spelling checking and in the words completion. Examples of these are: Trie, and Radix Tree

With this algorithm, we attend that the time is very small, because the enumeration of all possible combination is rigorously guided, excluding all wrong path, and using very fast lookup. To test this algorithm, I implemented an optimized version of Trie, that is known that they are expensive in terms of memory (remember that exist data structures that do the same in less memory).

This algorithm needs, in the average case, less than 1ms to calculate all words in the matrix.


We have seen three way to solve this problem, each of them has our pros and cons.

Algorithm Time Time
(with data structure population)
Requested Primary Memory
Full Enumeration 2,5sec+ 2,5sec+ ~6MB
Active Dictionary 150ms 170ms ~0MB
Branch and Bound 1ms 300ms ~140MB

The second one, despite the time of 150ms, is a practical algorithm, where you don’t need to memorize the dictionary in a data structure in the primary memory, letting you to read the dictionary sequentially. The speed of reading this file can be fast because the operating system can caches the file. This is recommended when you need that the process born and dead only solving a matrix configuration.

The third is surely faster, but you might consider that the algorithm needs a phase where it populates the data structure, that can be slow, and that the needed primary memory can be very large (this depends on the data structure that you’re using). This is recommended when you can launch he process and you can leave it to solve a lot of matrix configuration.

Leave a Reply

Your email address will not be published. Required fields are marked *