Monday, October 24, 2011

Bearoff database created

I built a little program to construct a bearoff database for me and put together an initial cut. It covers bearoffs on all six points in both players' home boards, with each player having up to nine checkers remaining.

I wrote out the data to a text file, with a key representing the board layout and value representing the equity (ie probability of win in this case, since the probability of a gammon is always zero). There are 25M elements in the table and the file takes 557MB, which needs to be loaded into process memory to be used. I looked at a table for six points and ten checkers, but that ended up being too big (closing on 4GB). I'm sure there's a clever way to partially load elements of the table when needed to avoid this kind of process memory bloat, but it's not worth figuring that out now.

The key representing the board layout is very simple. First you start with the player who holds the dice: for each point, starting with #1, add a character representing the number of checkers on the point. "A" represents 0, "B" 1, and so on. I used letters because potentially the number of checkers could be > 9 in the generic case. So for a bearoff db for six points, the key has six elements for each of the player's six points in the home board, and each element is a character. Then the same thing is appended for the opponent's position. For six points that means a twelve-character key.

Now I can build a strategy that toggles between a normal network, a race network, and exact bearoff probabilities in the appropriate cases. I'm going to design it generally so it'll work with potentially many different networks (more than the two listed above).

Next steps

I've continued to be frustrated in getting my network to sensibly converge when I include the "whose turn is it" input. I tried a bunch of different approaches but none of them seem to make that much difference:

  • Include a new output node for backgammons. I noticed that in the training for networks that contained probability of win and gammon nodes, the realized behavior included a lot of backgammons, and wondered whether this was skewing the results. So I added a new output node which represents the conditional probability of a backgammon win given a gammon win. As with the gammon node, you can use symmetry to imply the probability of a backgammon loss given a gammon loss. This ended up making very little difference.
  • Weight the learning rate for the gammon node by the probability of a win. Since the gammon node represents the conditional probability of a gammon given a win, the idea is that very little information is contained in games where the prob of win is low. And a similar thing when we include backgammon nodes: weight its learning rate by the unconditional probability of a gammon win. In practice, neither of these made much difference.
  • Instead of always estimating the probability of a gammon from the network, use knowledge of the game to properly set that probability to zero when the opponent has taken off at least one checker. At the same time, stop training the gammon node when the probability of gammon is zero. This basically means that the "end of the game" training that happens for the probability of win node happens (sometimes) earlier in a game. This converged to something non-trivial, in the sense that the gammon (and backgammon) probabilities didn't converge to zero or one; but the converged probabilities were not sensible and the resulting player performed poorly against the benchmark.

I've gone over the code many times with fresh eyes and don't see anything wrong here, so I'm assuming my network is just confused somehow.

I'm going to try it using a few different values for alpha to see whether I've just got too large a learning rate, but I've played with that before and don't hold out much hope.

The next thing I'm going to try is a more sophisticated network structure: include a race network and a bearoff database. I was holding off on doing this before since I wanted to try to solve the convergence problems on a simpler setup, but that's not bearing much fruit. Hopefully there isn't some bug I've missed that moots the more complex setup.


Wednesday, October 19, 2011

Bearoff databases

I was thinking a bit recently about "bearoff databases". These can be used near the end of many games, where both players are bearing off. The number of possible game states is actually fairly manageable in these cases, unlike the case of a generic board where the number of states is truly huge.

So in the case of bearing off, you can just enumerate the board layouts and the expected number of points in each.

Here's how it works: for a simple board, eg when both players have just one piece left, it's pretty easy to work out the expected number of points. Just run through all 24 possible rolls and check what happens in each one. In most of those cases the player with the dice will bear off and win a point. If not, you check what the expected number of points is for the opposing player post your move, and your expected number of points is just -1 times his.

You can then recursively figure out all the more complex board layouts: for a given layout, run through all possible rolls; for each roll check all possible moves; for each possible move check the opponent's expected number of points post-move, and choose your move to minimize your opponent's benefit. "Check the opponent's expected number of points" is the recursive bit - that will do a similar calculation, which will also recurse, and so on.

To make things easier you can store each board layout->expected number of points in a hash table so you don't have to keep repeating calculations. That's the database part of "bearoff database". Then once you have the database, you can quickly look up the expected number of points given a board layout.

That's a very brute force approach, and it only works if there aren't too many possible board layouts. Remember, the board layout is a combination of the player's setup and the opponent's setup: this is called a "two sided" bearoff database.

With a bit of math you can work out the total number of board layouts if you restrict your database to N points nearest the end and up to M checkers on those points:

Number of board layouts = ( (N+M)!/N!/M! - 1 )^2

That is, the number of ways to lay out just one player's side of the board is (N+M) choose N minus 1; and for each of those there are the same number of ways for the opponent's side to be laid out.

There are also "one sided" bearoff database. Here you enumerate just one player's board layouts, and for each you record the distribution of number of moves before all pieces are born off. If you have that data for both players in can approximate the expected number of points for either. I'm not that clear on how that approximation works, and haven't tried to investigate it much: the two sided database seems manageable and easier to understand.

Some examples for (number of points,number of checkers): (3,3): 361; (3,10): 81,225; (3,15): 664,225; (4,3): 1,156; (4,10): 1,000,000; (6,3): 6,889; (6,10): 64.1M; (6,15): 2.9B. On my Mac Pro in a single process I can calculate around 100k/minute. Storing them as an uncompressed text file uses around 1.8MB/100k entries.

Just for interest I note that the number of layouts for 24 points and 15 checkers is 6.3e20. This is an upper limit to the total number of valid backgammon boards, since it includes layouts where there are white and black checkers on the same point. It ignores layouts where either player has a checker on the bar, but I suspect those number much less than the overlapping cases. So the total number of backgammon boards is order 1e20 anyways.