Monday, October 24, 2011

Bearoff database created

I built a little program to construct a bearoff database for me and put together an initial cut. It covers bearoffs on all six points in both players' home boards, with each player having up to nine checkers remaining.

I wrote out the data to a text file, with a key representing the board layout and value representing the equity (ie probability of win in this case, since the probability of a gammon is always zero). There are 25M elements in the table and the file takes 557MB, which needs to be loaded into process memory to be used. I looked at a table for six points and ten checkers, but that ended up being too big (closing on 4GB). I'm sure there's a clever way to partially load elements of the table when needed to avoid this kind of process memory bloat, but it's not worth figuring that out now.

The key representing the board layout is very simple. First you start with the player who holds the dice: for each point, starting with #1, add a character representing the number of checkers on the point. "A" represents 0, "B" 1, and so on. I used letters because potentially the number of checkers could be > 9 in the generic case. So for a bearoff db for six points, the key has six elements for each of the player's six points in the home board, and each element is a character. Then the same thing is appended for the opponent's position. For six points that means a twelve-character key.

Now I can build a strategy that toggles between a normal network, a race network, and exact bearoff probabilities in the appropriate cases. I'm going to design it generally so it'll work with potentially many different networks (more than the two listed above).

Next steps

I've continued to be frustrated in getting my network to sensibly converge when I include the "whose turn is it" input. I tried a bunch of different approaches but none of them seem to make that much difference:

  • Include a new output node for backgammons. I noticed that in the training for networks that contained probability of win and gammon nodes, the realized behavior included a lot of backgammons, and wondered whether this was skewing the results. So I added a new output node which represents the conditional probability of a backgammon win given a gammon win. As with the gammon node, you can use symmetry to imply the probability of a backgammon loss given a gammon loss. This ended up making very little difference.
  • Weight the learning rate for the gammon node by the probability of a win. Since the gammon node represents the conditional probability of a gammon given a win, the idea is that very little information is contained in games where the prob of win is low. And a similar thing when we include backgammon nodes: weight its learning rate by the unconditional probability of a gammon win. In practice, neither of these made much difference.
  • Instead of always estimating the probability of a gammon from the network, use knowledge of the game to properly set that probability to zero when the opponent has taken off at least one checker. At the same time, stop training the gammon node when the probability of gammon is zero. This basically means that the "end of the game" training that happens for the probability of win node happens (sometimes) earlier in a game. This converged to something non-trivial, in the sense that the gammon (and backgammon) probabilities didn't converge to zero or one; but the converged probabilities were not sensible and the resulting player performed poorly against the benchmark.

I've gone over the code many times with fresh eyes and don't see anything wrong here, so I'm assuming my network is just confused somehow.

I'm going to try it using a few different values for alpha to see whether I've just got too large a learning rate, but I've played with that before and don't hold out much hope.

The next thing I'm going to try is a more sophisticated network structure: include a race network and a bearoff database. I was holding off on doing this before since I wanted to try to solve the convergence problems on a simpler setup, but that's not bearing much fruit. Hopefully there isn't some bug I've missed that moots the more complex setup.


Wednesday, October 19, 2011

Bearoff databases

I was thinking a bit recently about "bearoff databases". These can be used near the end of many games, where both players are bearing off. The number of possible game states is actually fairly manageable in these cases, unlike the case of a generic board where the number of states is truly huge.

So in the case of bearing off, you can just enumerate the board layouts and the expected number of points in each.

Here's how it works: for a simple board, eg when both players have just one piece left, it's pretty easy to work out the expected number of points. Just run through all 24 possible rolls and check what happens in each one. In most of those cases the player with the dice will bear off and win a point. If not, you check what the expected number of points is for the opposing player post your move, and your expected number of points is just -1 times his.

You can then recursively figure out all the more complex board layouts: for a given layout, run through all possible rolls; for each roll check all possible moves; for each possible move check the opponent's expected number of points post-move, and choose your move to minimize your opponent's benefit. "Check the opponent's expected number of points" is the recursive bit - that will do a similar calculation, which will also recurse, and so on.

To make things easier you can store each board layout->expected number of points in a hash table so you don't have to keep repeating calculations. That's the database part of "bearoff database". Then once you have the database, you can quickly look up the expected number of points given a board layout.

That's a very brute force approach, and it only works if there aren't too many possible board layouts. Remember, the board layout is a combination of the player's setup and the opponent's setup: this is called a "two sided" bearoff database.

With a bit of math you can work out the total number of board layouts if you restrict your database to N points nearest the end and up to M checkers on those points:

Number of board layouts = ( (N+M)!/N!/M! - 1 )^2

That is, the number of ways to lay out just one player's side of the board is (N+M) choose N minus 1; and for each of those there are the same number of ways for the opponent's side to be laid out.

There are also "one sided" bearoff database. Here you enumerate just one player's board layouts, and for each you record the distribution of number of moves before all pieces are born off. If you have that data for both players in can approximate the expected number of points for either. I'm not that clear on how that approximation works, and haven't tried to investigate it much: the two sided database seems manageable and easier to understand.

Some examples for (number of points,number of checkers): (3,3): 361; (3,10): 81,225; (3,15): 664,225; (4,3): 1,156; (4,10): 1,000,000; (6,3): 6,889; (6,10): 64.1M; (6,15): 2.9B. On my Mac Pro in a single process I can calculate around 100k/minute. Storing them as an uncompressed text file uses around 1.8MB/100k entries.

Just for interest I note that the number of layouts for 24 points and 15 checkers is 6.3e20. This is an upper limit to the total number of valid backgammon boards, since it includes layouts where there are white and black checkers on the same point. It ignores layouts where either player has a checker on the bar, but I suspect those number much less than the overlapping cases. So the total number of backgammon boards is order 1e20 anyways.

Monday, September 12, 2011

More frustration

I haven't been able to spend much time on this recently, but what I'm trying now is a variation on the symmetric setup - this time adding a new input that represents whose turn it is (=1 for the player's turn and 0 for the other player's turn).

I'm running up against another case where the network wants to converge an output probability to 100% almost all the time. This time the probability of win from the starting board seems to be okay (it settles around 53-55%, so a bit high compared to the gnubg standard, but nothing crazy) but the conditional probability of gammon win converges to 100%.

The last time I saw this, the problem was that I was not properly handing the dice off to the other player at the end of the turn (when evaluating the post-move probabilities). I do that correctly now, but the gammon probability is still messing up.

Frustrating! I'm not even sure how to debug this sensibly. It's a huge complex nonlinear network, so looking at the weights doesn't give me any intuition about what's happening. I've gone over the code many times now and can't see any obvious bugs. The slide toward 100% probability happens gradually over thousands of iterations.

Friday, September 2, 2011

Reference estimate of probabilities from the starting position

One thing I'd like the network to estimate properly is the probability of a win (any kind of win) and the probability of a gammon win or loss from the starting position. To do that, of course, I need some independent estimate to compare against.

I checked in gnubg to see its estimates, and reproduce them here for reference (assumes the player has the dice):

Probability of any kind of win from the starting position: 51.9%
Probability of a gammon win (incl backgammon): 15.1%
Probability of a gammon loss (incl backgammon): 13.6%

Thursday, September 1, 2011

Ah ha!

I finally cracked the nut on the problem with the "whose turn is it" input.

The problem was how I trained the board for the t+1 state - ie updating the probabilities after the player moves.

I was evaluating the network without changing who owns the dice. So not surprisingly: if a player always holds the dice, they will definitely win!

Now that I made it flip the dice for the t+1 evaluation, it seems to be properly converging, at least in initial tests.

Phew - that one was bugging me for several days.

Frustrating results

I'm running up against a brick wall in trying to add "whose turn is it" to the inputs.

Basically whichever way I implement this, the probability of win output trains gradually to 1, as does the conditional probability of gammon win, and the conditional probability of gammon loss goes to zero.

I've tried adding this as an input to the symmetric network, to the non-symmetric networks, and to a completely rebuilt network that follows Tesauro's design exactly. In all cases, if I leave out the "whose turn it is" input (by setting the hidden->input weights for that input to zero and not training them), the network trains just fine, giving results like I posted before. But if I train those weights, the network trains to the ridiculous probabilities I mentioned above. It does not matter which position I put that input - ie it's not an error in the code that has a problem with a particular input index.

I really don't understand what's going on here. I see no mention in any publications about a problem with this input.

Thinking about it a bit, this input does seem to have a bit of a preferred status: whenever it's your turn to roll, you expect your probability of winning to get more positive just because most of the time you're removing pips through your roll. I don't think any of the other inputs have that status.

In the training, the change in weight value is ( new output - old output ) * input value * output->hidden weight * ( stuff that is positive ). For the probability of win and probability of gammon outputs, and the input value is +1 if it's your turn, the expected value of the first two terms is positive.

The output->hidden weight can be positive or negative. If it's positive, the trained middle->input weight will tend to +infinity; if it's negative, the weight will tend to -infinity. I think that's what's happening here.

But this seems a bit facile - no one else mentions this as a problem. So I suspect I've got some fundamental problem in my setup that I'm missing.

The hunt continues!