> Home > Technology > Beer > Personal > The author . > On Twitter FollowRSSMy bookRough guide to Lithuanian beer Technology blogs
Robert Barta 
Posted in Technology on 20080910 16:40
My girlfriend likes to solve the sudoku puzzles in the newspaper, but I never bothered with it myself, thinking that I shouldn't spend time on something a computer can do for me. Writing a sudoku solver, however, sounded like it might be fun. And, so, since I had nothing better to do I decided to give it a shot. One thing I wanted to see was how sophisticated you have to make the solver. The search space in Sudoku is vast in theory, but there are tight internal constraints on it, and so I figured it would be interesting to see how far one could get with a brute force solver. Sudoku is known to be an NPcomplete problem, so obviously even the cleverest solver I could write would eventually run into problems. In other words, a worthy challenge. A reason for writing this is that I think the work on this solver nicely illustrates the difference between optimizing the code and optimizing the algorithm. In the first version I do nothing to make the code particularly fast. Later, well, you'll see what happens later. Brute forceI decided to write the thing in Python, and to store the puzzles in simple text files, using this format: 23.54. .6.... 1..... .....3 ....1. .12.35 That means we can write the top level of the program like this: board = [line.strip() for line in open(sys.argv[1]).readlines()] size = len(board) board = [list(row) for row in board if len(row) == size] assert len(board) == size if size == 6: in_square = in_square_regular(n, r, c, b, 2, 3) candidates = "123456" elif size == 9: in_square = lambda n, r, c, b: \ in_square_regular(n, r, c, b, 3, 3) candidates = "123456789" elif size == 16: in_square = lambda n, r, c, b: \ in_square_regular(n, r, c, b, 4, 4) candidates = "0123456789ABCDEF" else: assert 0 display(board) start = time.clock() search(0, 0, board) print "Time taken:", time.clock()  start The first line reads in the board, storing it as a list of strings. size tells us whether this is a 6, 9, or 16size sudoku board. We then turn the strings that represent each row into lists (so that we can change them when we search for solutions), throwing away any that are not the right length. Then we can check if we still have all the rows (with the assert) and stop if we don't, since that means the board was malformed. The next section sets up candidates which is all the characters that can appear in a cell on the board. The in_square function is used to check if a number appears in a square of cells on the board, and since this is irregular for 6size sudokus we need a slight tweak there. (You'll see why when I explain the in_square_regular function below.) Then we display the board, and time the call to the search function, which does the actual work. We call it with the coordinates of the topmost lefthand cell, which is where we start the search. So let's look at the search function: def search(rix, cix, board): if board[rix][cix] == ".": for num in candidates: if not num in board[rix] and \ not in_column(num, cix, board) and \ not in_square(num, rix, cix, board): board[rix][cix] = num callnext(rix, cix, board) board[rix][cix] = "." else: callnext(rix, cix, board) This is a classic recursive search function, basically. It first checks if this spot on the board is free, and if not moves on to the next. I use the callnext function for this, because it requires a couple of lines of code, and happens more than once. If the spot is free, we loop over all the possible values that could go into a cell, which is stored in candidates. The if then checks:
If the answer to all these is "no" it means that as far as we know up to this point the value could be correct in this cell. Of course, we may find later that it doesn't work out, but for now we put it into the board in the current cell, then continue the search. Now here is the beauty of recursive search: we don't need to store the original board, or do anything fancy to keep track of what we've tried so far. The stack does this for us, by storing the state of each search call that's gone before, so that we know what remains to be tried in (0, 0), (0, 1), (0, 2), and so on. So all we need to do is when we've tried all the possible values and have to give up is to put back the period to mark this cell as free. As simple as it is, it really does work, although you may have to spend a few minutes thinking it through to see why. Simulating it on paper with a 2x2 board may help. So let's look at the helper functions I used: def callnext(rix, cix, board): if cix + 1 != size: search(rix, cix + 1, board) elif rix + 1 != size: search(rix + 1, 0, board) else: print "===== A SOLUTION =====" display(board) First we check if we're at the end of the row, and if not we move to the next column. If we are at the end, we check if this is the last row, and if not we move to the next row. The last possibility is that we're at the end of the last row, and if so that means we've filled out the board, so we must have found a solution. We print it, then return. Returning works because we come back to the search call for the last cell in the last row which will run through the rest of the possibilities, then returning. So with this approach we continue the search once we've found a solution, to see if there might be more solutions. It only remains to look at the two functions used to check whether a value occurs in a column or a square. Let's start with the column one: def in_column(num, cix, board): for rix in range(0, size): if num == board[rix][cix]: return 1 I guess this needs no explanation. The squares, however, are a different proposition altogether. def in_square_regular(num, rix, cix, board, rwidth, cwidth): roff = rix % rwidth rlow, rhigh = rix  roff, rix + (rwidth  roff) coff = cix % cwidth clow, chigh = cix  coff, cix + (cwidth  coff) for rix in range(rlow, rhigh): for cix in range(clow, chigh): if board[rix][cix] == num: return 1 Here, width is the size of the squares, which is set as a fixed parameter in the setup code in the top level. For 9x9 it's 3, and for 16x16 it's 4. For 6x6 there are two squares horizontally, and three squares vertically, so there are two different widths, which is why there are two width parameters. (If you look back to the toplevel, you'll see that this is why those ugly lambdas are there. I could look it up from size each time, I guess. Let's consider this a premature optimization and move on.) The way the code works is to first compute how far into the square we are on this row (roff = row offset). At (0, 0) we'd get 0, as we're on the first column in the square. At (0, 1) we get 1, as we're on the second, and so on. In a 9x9 we'd get 0 for (0, 3) as that's the first column in the second square. And so it goes. Armed with this, we can compute the coordinate of the first column in the square, rlow, and then of the last, rhigh. We then repeat for rows, and after that it's a simple thing to run through the square looking for the value we're testing for. (Yes, I know this could be faster. Let's leave that aside for now, and come back to it later.) PerformanceSo that's the code for the brute force search. The interesting question, of course, is how well it performs. When I run it on the 6x6 sample above I get a time of either 0.0 or 0.01 seconds. So basically the time is negligible, and clearly we've done the job for 6x6s. But what about 9x9? I have a few of those. The first one takes 0.1 seconds. The second 0.62 seconds. Then there's a tricky one that takes about 20 seconds. A fourth takes all of 80 seconds. So we appear to have 9x9 under control, right? Well, Wikipedia has a 9x9 that's designed to foil brute force solvers. That one took 31 minutes. So clearly we're not done yet. Looking at 16x16 is also interesting for comparison. Running a few of those I get 1 hour 18 minutes (solution found much earlier) for the first one, the second I stopped after roughly 84 hours, and the third I decided not to try. So clearly we don't have those cracked, either. (Note that all of these times are not times to find the solution, but to go through the entire search space to find all solutions. On average that takes twice as long.) OptimizingWhat's needed here is obviously being a bit smarter and optimizing the code a bit. Let's say a 1 appears in the first row of a 9x9 problem. The current code will try putting a 1 into all open cells on that row, only to find it doesn't work. Not only that, but if the last column is open it will try a 1 there every time we come back after having backtracked from that cell. An obvious fix to this is to make a list of the values that could fit in each cell, where we take out those excluded by the filledin values given in the problem. So in search, instead of looping over candidates for each cell, we loop over possibles[rix][cix]. If we set up that list before searching, this should work just fine. So in the toplevel, before the search call, we insert the following: possibles = [] for rix in range(size): row = [candidates[:] for cix in range(size)] possibles.append(row) find_possibles(board) The function looks like this: def find_possibles(board): for rix in range(size): for cix in range(size): if board[rix][cix] == ".": possible = [] for x in possibles[rix][cix]: if not x in board[rix] and \ not in_column(x, cix, board) and \ not in_square(x, rix, cix, board): possible.append(x) possibles[rix][cix] = possible else: possibles[rix][cix] = [] It's pretty straightforward. For each open cell, look through the candidates and keep the ones that could fit on the board as given. For closed cells, just put in an empty list. This could have been made simpler, by not first building a totally open list of possibles, and then looping over it to cut it down, but doing that has benefits later. So what is the performance benefit of this? Below is a table comparing the performance. The third column has the previous time in seconds, the fourth the new time.
As you can see, this helps, but not that much. In fact, it seems to help by a roughly constant factor of about 30%. The reason is that for each cell where search goes through the list we've removed on average roughly 30% of the numbers, reducing the workload by about 30%. Unfortunately, when the starting point is more than an hour, that doesn't help much. We could of course continue to optimize the code, because as written there's a number of things it does in the critical parts of the code that could be faster. However, this is just going to be more constant factors of improvement, on about the order of the one we just made. We need an awful lot of those to reduce the longest search times to the sort of times people will be willing to wait for an answer. What we need is something that goes further than a simple constant factor. We need something that cuts out big swathes of the search space by reducing the number of possibilities. That may sound like a tall order, but there's an easy next step to take. Read on. An obvious next stepThe first trick people doing Sudoku try is usually looking for a cell where only one value could possibly fit. We've already done that above, we just don't exploit it. What we should do is to see if possibles has a list with only one value somewhere. If it does, we can stick that value onto the board right away, and so save the search function having to verify it over and over again. And, once we've stuck it onto the board, this reduces the possibilties in other cells, which means we can run find_possibles again. And, of course, now we might find new singletons. So we should keep repeating this until no new singletons are found. The actual code is surprisingly simple. We replace the toplevel call to find_possibles with a call to prepare, which looks like this: def prepare(board): found = 1 while found: find_possibles(board) found = find_singletons(board) print "SINGLETONS:", found What it does should be clear. The code to find singletons isn't much more complex: def find_singletons(board): found = 0 for rix in range(size): for cix in range(size): if (board[rix][cix] == "." and len(possibles[rix][cix]) == 1): board[rix][cix] = possibles[rix][cix][0] found += 1 print "Singleton at %s, %s: %s" % \ (rix, cix, board[rix][cix]) possibles[rix][cix] = [] return found That's it. So how much does this actually help? The table below adds two columns to the previous one, where the first shows how many singletons were found in each pass, followed by the sum of singletons found, and last the number of seconds taken.
As you can see, where no singletons were found we achieved nothing, except a slight delay through the extra work. Where we did find singletons the effect was usually big. Certainly bigger than the effect of the optimization we did. So this looks like the right track, but clearly we need to go further along it to make this fast enough. So can we come up with another trick? Of course. Speed through sophisticationThe second trick people doing Sudoku tend to employ is to write down in small script the possible values in a set of cells (which we already do). They then search through a row and see whether a certain value appears only once in the row. If it does, that means it has to go there, since it can't possibly go anywhere else. So all open cells might have more than one possible option, but if we have a row with only three open cells, and the possibles for these are (3, 6, 9), (4, 6), and (3, 9) then it's clear that 4 will have to go into the second cell. The best way to approach this is to first find the possibles lists, then fill in singletons, then redo the possibles, then do this cross check. If either the singleton check or the cross check manages to fill something in we can repeat to see if this creates a new possibility somewhere else. So prepare gets expanded to: def prepare(board): found = 1 while found: find_possibles(board) found = find_singletons(board) print "SINGLETONS:", found find_possibles(board) found2 = cross_check(board) print "CROSS CHECKED:", found2 found = found + found2 The cross_check function gets a bit longwinded. It could probably be simplified with generators that produce the sequence of cell coordinates, but I can't really be bothered. Here's the current version: def cross_check(board): found = 0 # check rows for rix in range(size): pos = {} for cix in range(size): for c in possibles[rix][cix]: add(pos, cix, c) for c in pos.keys(): if len(pos[c]) == 1: cix = pos[c][0] print "ROW FOUND: (%s, %s) > %s" % \ (rix, cix, c) board[rix][cix] = c possibles[rix][cix] = [] found += 1 find_possibles(board) # check columns for cix in range(size): pos = {} for rix in range(size): for c in possibles[rix][cix]: add(pos, rix, c) for c in pos.keys(): if len(pos[c]) == 1: rix = pos[c][0] print "COL FOUND: (%s, %s) > %s" % \ (rix, cix, c) board[rix][cix] = c possibles[rix][cix] = [] found += 1 # we don't implement squares on 6es correctly, so skip it if size == 6: return found find_possibles(board) # check squares width = int(math.sqrt(size)) for rs in range(width): for cs in range(width): pos = {} for rix in range(rs * width, (rs+1) * width): for cix in range(cs * width, (cs+1) * width): for c in possibles[rix][cix]: add(pos, (rix, cix), c) for c in pos.keys(): if len(pos[c]) == 1: (rix, cix) = pos[c][0] print "SQUARE FOUND: (%s, %s) > %s" % \ (rix, cix, c) board[rix][cix] = c possibles[rix][cix] = [] found += 1 return found You'll see that the same pattern repeats for rows, columns, and squares. We go through all the cells in a region and build a map pos keyed from the value to the list of cells where it can occur. Once that's collected we go through all the values to see if any can occur in only one cell. If it can, we add it to the board and empty out the list of possibles for that cell. The add function is just a simple convenience: def add(map, pos, candidate): list = map.get(candidate, []) if not list: map[candidate] = list list.append(pos) Given this it's time to go back and look at how the performance has developed.
Now this is beginning to look like something. In two of the cases, our two strategies are now enough to solve the puzzle completely, so that we don't need to search at all. So the most difficult 9x9 now takes a fraction of a second instead of half an hour. In fact, we've speeded it up by a factor of more than 16,000. For the first 16x16 the speedup is by a mere 13,000, while for the second it's at least a factor of 1800. There's no way optimizing the code would have gotten us that kind of improvement; only improving the algorithm can make improvements this big. In fact, these are not the only improvements possible. I have implemented another two techniques which do not fill in squares on the board, but which can eliminate some of the possibilities from the possibles table. Including those gives us:
As you can see, we still have made absolutely no progress on puzzles 8 and 9. This should not be too surprising, given that these were taken from Wikipedia's list of the hardest known Sudokus. The solver still needs nearly two hours to solve the last 16x16, so there is still room for improvement, but I'm not sure I'm going to work any more on this solver. The complete source code is here in case anyone else wants to play with it. CommentsSqrha  20080910 13:40:38 Nice job  my father told me to get out the room when I showed him how fast computer can solve sudoku. Anyway thanks for sharing. xtian  20080910 13:56:21 Cool! Peter Norvig has a really nice Python sudoku solver here: http://norvig.com/sudoku.html He uses constraint propagation with searching, and it's quite short and reasonably readable. Joshua Haberman  20080910 14:46:10 I haven't read your entry in detail, but have you seen Peter Norvig's solution to the same? It's even in Python. schtief  20080911 10:09:54 cool, i progged around a sudoku solver at my holiday too. i had an idea about a sudoku solver challenge. just devel an api (maybe ws) which every solver has to implement and the server generates sudokus and sends them to the opponents. first is the winner. i also thought about a 2 player sudoku game with a chess clock, where the player with the less time left wins. This one can be played also over the net Jonathan  20080911 10:38:11 i thought about writing something like this myself. I knew it would be quick, but sub a second makes me feel silly, especially considering my first puzzle took me almost two hours! Marina Neuerscheinung  20080918 12:15:56 I can´t believe you wrote this... I wouldn´t even think of writing something that could do Suduko for me... I guess the idea is that it works your brain  we forget that these days.. computers tend to be our brains!!! Good work! sveino  20081228 04:51:22 Very educational. I think it could be used right away as an example of optimizing code and improving algorithms. Yve  20090218 12:42:52 I studied your solution and I think it is clean, I link to a pretty complex one here. http://vbaexcel.eu/vbamacrocode/sudokusolver PyvalDict  20090916 16:42:55 That guy in http://vbaexcel.eu/vbamacrocode/sudokusolver made it with VBA. VBA sucks .. That's why it is unreadable If a program produced from a language can't be fast such as C should be less obfuscating than C code . Is there any reason to write a program that it will never be fast in a non easy to read language like python ??? Anyway great approach :) Well done able  20110524 23:28:41 hi, the source code is no longer available, can you reupload it? Lars Marius  20110525 01:31:16 @able: The code was there, but because of a configuration error Apache tried to run it instead of letting you download it. Fixed now. Thank you for reporting! mehpic  20111030 16:42:13 This was a very useful post, thanks! After reading it, I went on to play with my own Python implementation, and decided to make it as short as I could  came down to 10 lines for the "solving" part of the code, a little more for input/output. Take a look: http://mehpic.blogspot.com/2011/10/solveanysudokuwith10linesof.html But it does it pretty much the same way, a depthfirst brute force. Steven  20140730 13:20:41 After you mentioned this in a reddit post on /r/python, i made a web interface for it with Flask, modifying the solver to work as a module. Rather than redirecting all the print statements to an output, I just redirected stdout to a stringIO. Magic. https://gist.github.com/blha303/834734fda4f3a875259b Add a comment 
Last comments
