Friday, March 30, 2012

Optimal Janowski cube life index revisited

After fixing my implementation of Janowski's doubling strategy I tried again to find the optimal cube life index (assuming we just use a single constant index).

As before, I found the optimal cube life index is 0.7.

To test this I ran doubling strategies with different cube life indexes against one with a fixed index of 0.7 to see if any other index would outperform. For indexes near 0.7 I ran 1M self-play games, using Player 3.3 for checker play. For indexes a bit further out I used 100k self-play games.

Here are the results:


Two interesting points to note. First is that the probability of win continues to increase to x=0.95 even though the average score decreases, due to the asymmetric market windows of the two players. Second is that there is a fairly dramatic falloff in performance (both in score and probability of win) for x>0.95.

Wednesday, March 28, 2012

Janowski model revisited

I was a bit incorrect in how I interpreted Janowski's paper.

The real way to think about his model is as a linear interpolation between the dead cube equity and the live cube equity.

The dead cube equity is always P*(W+L)-L, normalized to the current cube value. The live cube equity is piecewise linear, and has different forms for three cases:

1) Cube centered. 0<P<TP, equity runs from -L to -1; TP<P<CP, equity runs from -1 to 1; CP<P<1, equity runs from 1 to +W.
2) Cube owned by player. 0<P<CP, equity runs from -L to +1; CP<P<1, equity runs from +1 to +W.
3) Cube owned by opponent. 0<P<TP, equity runs from -L to -1; TP<P<1, equity runs from -1 to +W.

TP and CP here are the live cube take and cash points, so

TP = (L-1/2)/(W+L+1/2)
CP = (L+1)/(W+L+1/2)

Then the Janowski model equity is x * live cube equity + (1-x) * dead cube equity.

That's it. Everything else falls out of that, and in practice you don't really need to calculate initial double, redouble, too good points, or even take and cash points - you just compare cubeful equities.

For example, the take decision calculates the equity when the player owns the doubled cube and compares that to -(cube value). If the equity on take is greater than the equity on pass, you take.

Similarly, the double decision calculates the equity on double (which is the minimum of the equity when the opponent owns the doubled cube and +(cube value) for the pass) and compares it to the equity at the current cube value when the player owns the cube. If the doubled equity is bigger, you double.

My error before was around too-good points: in my implementation I assumed that the too-good point was calculated by comparing the dead cube equity with pass value, thinking that you're giving up the value of the cube by not doubling. But really you're not; you still have the option of doubling if the game starts to go against you. You should be comparing the model cubeful equity at the current cube against the value you'd get on a double (the minimum of the take and pass equities there).

This invalidates some of the statistical results I calculated for Janowski's model before, so I'll have to redo those.

Also it makes it easy to generalize to two cube life indexes: each player's interpolation of equity uses their appropriate index.

Jump model performance

I implemented a doubling strategy following my jump model and ran it against the optimal Janowski doubling strategy. All checker play was Player 3.3.

This first implementation assumes a constant jump volatility, so does not make any attempt to calculate a local jump volatility in the game state. It's meant to be a benchmark to compare against the Janowski strategy with a single cube life index, which should be pretty similar in performance.

I ran one million cubeless money games of the jump model with different jump volatilities, vs the Janowski strategy with the optimal single cube life index of 0.7. The goal was to see which jump volatility value would give the best performance.

In the paper I made an estimate of a constant jump volatility of 7.7% based on a calculation of the average absolute probability change from turn to turn when the cubeless win probability was near the take and cash points. That was only an approximate calculation, based on crude windows of win probability, but should be roughly what comes out of this more careful analysis.

Here are the results of the games:


The blue line/points represent the self-play results for average points per game for the different jump volatility inputs, running from 5% to 11%. Each has an error bar of +/- 0.004ppg which represents the standard error on the averages. The black line shows a parabolic fit through the points. The red line/points represent the probability of win for the jump model strategy.

The jump volatility with the best ppg performance was 8.0%, quite close to the naive estimate from cubeless play statistics of 7.7%. But a jump volatility anywhere in the range 7.0% to 9.0% performed quite comparably. I fit a parabola to the ppg data and that suggests an optimal jump volatility of 8.5%, which is probably a more realistic number given the statistical uncertainty on individual points.

The jump model with constant jump volatility performed marginally better than the Janowski model with a single cube life index, though the margins were small: a few hundredths of a point per game, in games with an average cube value of around 2.3. The differences are statistically significant (the standard error for all ppg numbers is +/- 0.004ppg) but only because I ran 1M games; really the performance is practically indistinguishable.

That is roughly what one would expect given that the model with a constant jump volatility and constant conditional win and loss amounts W and L can be mapped directly onto Janowski's model with a single cube life index. Of course W and L are not actually constant, so the models are not practically the same, but they should be close.

That said, the probability of win showed a much larger effect than points per game. A jump model strategy with a jump volatility of 5% won 50.2% of its games (while losing 0.006ppg in score); a strategy with jump volatility of 11% won only 44.2% of its games (but won 0.011ppg in score).

High jump volatility means the player chooses to double at lower probability of win and pass at higher probability of win. So the games it loses because of passing a double are offset by games it wins when doubled.



Monday, March 26, 2012

New cube decision model posted on arxiv.org

I posted the next draft of my new model for calculating cube decision points to the preprint site arxiv.org, so now it has a proper stable link:

Cube Handling In Backgammon Money Games Under a Jump Model

Abstract

A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Each jump is drawn from a distribution with zero mean and an expected absolute jump size called the “jump volatility” that can be a function of game state but is assumed to be small compared to the market window.

Simple closed form approximations for the market window and doubling points are developed as a function of local and average jump volatility. The local jump volatility can be calculated for specific game states, leading to crisper doubling decisions.

Checkpoint: recent progress and next steps

Since my last checkpoint around 2m ago I've made quite a lot of progress.

Main highlights:
  • Networks: I added a crashed network following the GNUbg definition of "crashed".
  • Network inputs: I added a new input to the contact and crashed networks that measures strength of prime blocking. I also extended the race inputs to more granularly track the checkers born in.
  • Training: I grabbed the GNUbg training databases for contact, crashed, and race networks, and ran supervised learning against them. The best performance is first trained using TD learning, then uses those network weights as the starting point for supervised learning.
  • Benchmarking: I moved from bot self-play performance to benchmarking against the GNUbg benchmark databases for cubeless play, since that gives a more accurate and robust set of benchmarks for different game phases. I ran some stats to look at the relationship between self-play results and GNU benchmark scores.
  • Cube handling: this was a new area for me since my personal game is weak on cube play. I learned about Janowski's classic model for money games, as well as how to construct match equity tables for tournament play.
  • New model for money game cube action: I developed a new model for cube decisions in money games, moving from the "live cube" limit where probability of win is assumed to diffuse, to a jump model where the probability of win jumps each turn.
Along the way I developed several new players incorporating the new ideas. As of now my best checker player is Player 3.3, which scores GNUbg benchmark scores of Contact 13.3, Crashed 12.0, and Race 0.93 (compared to GNUbg 0-ply scores 10.5, 11.0, and 1.01).

Stuff still to do:
  • Money play: implement my new model properly and test out its performance. Hopefully it should perform better than the standard approaches.
  • Match play: implement matches properly in my backgammon framework and see if I can extend my new model approach to match play as well (coming up with better match equity tables along the way).
  • Checker play: my players' Contact benchmark scores are still pretty weak compared to GNUbg. I suspect I need to add more custom inputs. I also want to investigate some more advanced supervised learning techniques applied to training against the GNUbg training databases.
  • Rollouts: rollouts are still pretty basic in my framework; I need to properly implement variance reduction and cutting off rollouts before the end of the game.
  • FIBS hookup: I still want to hook up my bot to FIBS and see how it performs there in real-world games.
  • Networks: I've been talking with Øystein Johansen about maybe breaking up the "contact" specification into separate nets for different game phases, perhaps using k-mean clustering to define the phases.
Fun stuff!

Saturday, March 24, 2012

New model paper v2

Here's the second draft of the paper, revamped considerably:

The main changes:
  • Added local and average jump volatility. Take and cash points depend only on the average volatility since on a pass/take decision you're comparing post-double equity with -1. The post-double equity depends on the jump volatility at the cash point; but should the game turn and head back up there it'll probably be in quite a different state than it's in near the take point. So using the average jump volatility seems more reasonable. Local jump volatility impacts the cubeful equity at the current cube level, which in turn impacts doubling decisions.
  • I note that if I assume a constant (single) jump volatility, it maps very nicely onto Janowski's model with one x. But the local + average jump volatility version does not map as nicely onto his model extension to a cube life index for each player, x1 and x2.
  • I changed the definition of jump volatility to be the expected absolute jump size instead of jump standard deviation. That's what the equity calcs care about. In fact, I only need to assume the jump distribution is symmetric for positive and negative jumps; if that is true then I don't need to know anything more about the jump distribution than its expected absolute jump size. Pretty nice! I got caught up for a while playing around with different distributional assumptions before I cottoned onto that.
  • I rewrote Appendix 3, the one that works through the details of the jump model, to try to make it more clear when you should use local vs average volatility. That I suspect is the subtlest part of this. I think I understand this properly now, but I went through a couple of iterations myself before I got here, so it could be that I'm still getting something wrong.
This is on the back of a very rewarding conversation with Rick Janowski, who pointed out how I needed to rethink this in terms of something more like his model where he has a different cube life index for both players.


Wednesday, March 21, 2012

Optimal Janowski cube life index

I tried to determine the optimal constant Janowski cube life index by playing players with different cube life indexes against each other.

I started broadly by playing different cube life indexes against a fixed 0.7, since 0.7 is near where other people have quoted the cube life index.

Statisticx=0x=0.25x=0.50x=0.75x=1
Average points per game-0.30-0.14-0.040.00-0.09
Probability of win37.541.845.951.258.3
Average cube2.322.402.271.961.45

Interesting that the probability of win was well above 50% but on average lost points for x=1. It must be because when it wins it wins small, but when it loses it loses big. Presumably that's because the player gets doubled and takes when the game goes against it, but only doubles past the cash point when the game goes in the player's favor.

Next I tried to zero in on the winning range by running the same thing in a tighter range:

Statistic
x=0.55
x=0.60
x=0.65
x=0.70
x=0.75
Average points per game
-0.03
-0.01
0.00
0.00
0.00
Probability of win
46.9
47.8
48.9
50.0
51.2
Average cube
2.23
2.17
2.11
2.04
1.96

So it seems like 0.7 is indeed the optimal cube life index in terms of expected points per game, at least within +/- 0.05.

Janowski-model money game cube statistics

I've got a working doubling strategy in my backgammon framework now for money games, applying Janowski's model.

Now I can calculate some statistics on the values of the cube; kind of like an analysis on Jellyfish doubling that was done in 1999.

I'm interested in how doubling statistics change as Janowski's cube life index (sometimes called cube efficiency) varies. I ran 100k cubeful money games, using Player 3.3 for checker play, and Janowski's doubling model for different cube life index values.

Statisticx=0x=0.25x=0.50x=0.75x=1
Percent cashed21.827.440.050.996.3
Percent single55.451.245.934.80.4
Percent gammon21.820.318.113.63.1
Percent backgammon1.21.11.00.70.2
Average cube14.74.492.721.891
Percent cube=10.63.212.333.9100
Percent cube=224.247.861.056.70
Percent cube=424.431.321.28.50
Percent cube=818.612.24.50.80
Percent cube=1612.84.10.800
Percent cube=328.21.10.100
Percent cube=6411.10.4000

In the case of x=1 (the live cube limit) the initial double point is right at the cash point, so the player never offers a double when the opponent will take. Most games then end in cashes.



Tuesday, March 20, 2012

New model for money game cubeful equity

I came up with a new model for estimating cubeful equity in backgammon money games that seems like an interesting way to think about the Janowski cube life index in specific game states:

Abstract

A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Jumps occur when a Poisson process fires, and each jump is drawn from a normal distribution with zero mean and a standard deviation called the “jump volatility” that can be a function of win probability but is assumed to be small compared to the market window.

Simple closed form approximations for the market window and doubling points are developed as a function of local jump volatility. The jump volatility can be calculated for specific game states, leading to crisper doubling decisions.

All cube decision points under this model match Janowski’s if his cube life index is implied from this model’s take point, so these results can be viewed as a framework for estimating the Janowski cube life index more accurately for specific game states.

Thursday, March 15, 2012

Match equity table with MWC

Most people show match-winning chance (MWC) in their match equity table instead of equity. Of course the standard conversion of equity = 2 MWC - 1 holds. Here are my calculated match equity tables showing MWC:


1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away5068.175.581.784.589.091.093.594.696.196.897.7
2-away
5059.566.173.679.283.486.789.491.793.494.8
3-away

5057.064.570.875.679.883.386.488.990.9
4-away


5057.463.969.273.777.981.584.587.0
5-away



5056.662.267.271.875.979.482.5
6-away




5055.961.166.070.474.477.8
7-away





5055.260.465.069.273.0
8-away






5055.259.964.368.3
9-away







5054.859.463.6
10-away








5054.658.9
11-away









5054.3
12-away










50

I left out the bottom half of the table for readability, but all values there are just 100 less the corresponding value in the top half.


And here are the corresponding post-Crawford MWCs (for player n-away, opponent 1-away):


1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away5048.631.930.618.717.711.310.66.76.34.03.7

These compare well to the "Mec26" MWC table here, mostly within 0.2%. Those are based on GNUbg rollouts for post-Crawford MWCs and using a gammon probability of 26.0% (vs 27.6% in mine). The biggest differences are in 1-away,2-away and 1-away,3-away where the MWC is different by about 0.5%.

Another modern MET is the G11 table. Again, not much difference between my calculated MET and this one, though bigger differences (especially near the diagonal) than with Mec26.



Wednesday, March 14, 2012

Full match equity tables

The next step after getting the Crawford-game match equities is to build out the full match equity tables for pre-Crawford games.

I followed the algorithm defined here, by Tom Keith, but there were a few assumptions I didn't fully understand. More on that later. Here is my calculated pre-Crawford match equity table, out to 12-away:



1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away00.3620.5100.6350.6900.7800.8200.8690.8920.9220.9360.953
2-away-0.36200.1890.3220.4720.5850.6670.7330.7880.8340.8670.896
3-away-0.510-0.18900.1390.2890.4150.5130.5950.6670.7290.7770.819
4-away-0.635-0.322-0.13900.1490.2790.3850.4740.5580.6300.6900.741
5-away-0.690-0.472-0.289-0.14900.1310.2450.3430.4360.5180.5880.649
6-away-0.780-0.585-0.415-0.279-0.13100.1180.2210.3200.4090.4880.557
7-away-0.820-0.667-0.513-0.385-0.245-0.11800.1050.2080.3000.3850.460
8-away-0.869-0.733-0.595-0.474-0.343-0.221-0.10500.1030.1980.2870.367
9-away-0.892-0.788-0.667-0.558-0.436-0.320-0.208-0.10300.0960.1880.271
10-away-0.922-0.834-0.729-0.630-0.518-0.409-0.300-0.198-0.09600.0920.178
11-away-0.936-0.867-0.777-0.690-0.588-0.488-0.385-0.287-0.188-0.09200.087
12-away-0.953-0.896-0.819-0.741-0.649-0.557-0.460-0.367-0.271-0.178-0.0870


And here are the corresponding post-Crawford match equities, recalculated using 2-ply Player 3.3 to get state probabilities in the first two rolls.


1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away00.0280.3620.3880.6260.6450.7740.7880.8650.8740.9190.925

I calculated the average probability of win, gammon win, and backgammon win, and corrected all player-calculated probabilities so that the average probability of win was 50% and the average probability of gammon win/loss and backgammon win/loss were symmetric.

The probability of the game ending in gammon using these adjusted state game probabilities is 27.60%. The probability of the game ending in backgammon is 1.1%.

The approach:

  • Define M(n,m) as the match equity for player n-away and opponent m-away, for symmetric players before the first dice are thrown in a game (so both players have 50% chance of a win and the cube is at 1). By convention I'll assume n<m but M(m,n)=-M(n,m) so this isn't an issue.
  • Define Mp(n,m,c,X,Pw) as the match equity for player n-away, opponent m-away, cube value c, owned by player X (either P for player, O for opponent, or C for centered when c=1), and probability of any player win Pw. We'll assume this is piecewise linear in Pw in the range [take point,cash point] for centered cube, [take point,1] for opponent-owned cube, and [0,cash point] for player-owned cube.
  • The match equity table value M(n,m) = Mp(n,m,1,C,0.5), so if we can get Mp functions we're done.
  • We start with the cube at the maximum value, so that c is the smallest value that's greater than m (remember we assume m>n by convention). In this state the player owns the cube, since only the opponent can double to this level. No more doubling can happen, and any win or loss is a match win or loss, so Mp(n,m,cmax,P)=2 Pw - 1.
  • Then we step back to the cube 2x smaller than the maximum. For the case where the player owns the cube we calculate the cash point, where the opponent's equity in the maximum cube state equals their equity on a pass (which uses the match equity table for the corresponding away values which we bootstrap). We assume that the player does not double until the cash point, and so M(n,m,c,P) is linear from Pw=0 to Pw=cash point. We know the cash point value from the opponent's take/pass decision, and we know the match equity for a player loss at this cube value (which also uses the appropriate values from the match equity table for M(n,smaller m values)).
  • For the case when the opponent owns the cube, we assume the opponent doubles at the player's take point, and that the opponent's equity is piecewise linear in Pw from the take point to Pw=1 (where we know the opponent's winning value from the match equity table lookup).
  • We continue to step back to cube=1, where we assume that match equity is linear between the player's take point and cash point. This gives us Mp(n,m,1,C,Pw), which we use to calculate M(n,m) = Mp(n,m,1,C,0.5).
This follows Tom Keith's algorithm and returns the match equity table displayed above, using the probability of gammon given above.

The main assumption here is that the player and opponent double only when they are at the cash point or take point. This seems a bit unusual - really you'd double earlier than the cash point, and the opponent would double you earlier than your take point.

I tried an alternative approach, similar to the "live cube" model in Janowski: the probability of win diffuses like a Brownian motion with constant volatility sigma. Then I can treat the doubling problem as a derivatives pricing problem, using numerical backward induction to get the fair value. Given a cube value I know the match equity at Pw=0 and Pw=1, and also that the match equity (given a cube value) satisfies the partial differential equation sigma^2/2 d^2(match equity)/dPw^2 + d(match equity)/dt = 0. You solve this PDE separately for different "layers" corresponding to different cube values and cube ownership, and at each time step mix the values across layers based on doubling decisions.

This converged to exactly the values described by the Keith algorithm above. So the Keith algorithm is equivalent to a live cube model where probabilities diffuse continuously rather than jump. This makes me wonder whether a more accurate match equity table could mix between this limit and something like the dead cube limit, as with the money cube algorithm described by Janowski.

Player 3.3: TD then SL

Following the suggestion of Ã˜ystein Schønning-Johansen (in comments here), I tried training my player from random weights using TD learning, then once that had converged, switching to supervised learning on the GNUbg training databases.

The GNUbg team's experience is that this converges to a better player. Indeed, looking at Player 3.2's self-play, it does have some odd features: for example, roughly 7% of its games against itself result in a backgammon, much higher than the network odds predict from the starting board.

The new player that resulted is Player 3.3. Its benchmark scores: Contact ER 13.3, Crashed ER 12.0, and Race ER 0.93. In 40k cubeless money games it scores +0.595ppg against PubEval, +0.174ppg against Benchmark 2, and +0.028ppg against Player 3.2 in 100k cubeless money games. In terms of network setup it is the same as Player 3.2 (contact, crashed, and race networks; 120 hidden nodes; same extended inputs).

The first step was TD learning from random weights. I ran training for over 5M iterations trying to get solid convergence. I switched to use the GNUbg benchmarks as training benchmarks instead of gameplay against a reference player, since this is more accurate and quicker. Here are the learning charts for Contact, Crashed, and Race ER, starting at training run 100k to remove the initial rapid convergence, and using log scale on the x-axis:
























































The best benchmark scores were Contact ER 17.9, Crashed ER 21.5, and Race ER 1.79. Worse than Player 2.4 on Crashed ER but otherwise my strongest TD-trained player. In 100k cubeless money games against Player 3.2 it scored an average of -0.051ppg +/- 0.004ppg, and against Player 2.4 it scored +0.015ppg +/- 0.004ppg. In self-play the statistics of wins, gammons, and backgammons look plausible: 27% of the games end in gammon and 0.8% in backgammon.

The next step was to continue to train this player using supervised learning on the GNUbg training databases, starting with the TD-trained networks.

For a learning algorithm I used standard backpropogation on a randomly-sorted training database. I started with alpha=1 and kept alpha fixed over epochs as long as the Contact ER benchmark score did not increase for three consecutive epochs. If it did, I dropped alpha by a factor of sqrt(10). When alpha dropped below 0.1/sqrt(10) I popped it back up to 1 and continued. This follows what Joseph Heled described in his GNUbg training, though with alphas about 10x smaller (I tried using alpha=10 but it did not converge at all). The idea here is that if the training finds a local minimum and cannot converge, you use a larger alpha to kick it out of the local minimum and try again. Of course the best-performing networks are always saved so you can return to them. Once the player stopped improving I ran it for several hundred epochs at smaller alpha to help it to converge.

After this training the best benchmark scores were Contact ER 13.3, Crashed ER 12.0, and Race ER 0.93. So a significant improvement over the TD-trained player, which is not surprising, but also better than my best SL-trained player, Player 3.2 (which scored 14.0, 12.8, and 2.08).

In addition, the self-play statistics look more realistic. In 10k self-play games, 26.9% were gammons of any kind and 1.4% were backgammons.

It beats Player 3.2 by 0.028ppg, which is significantly more than any of the multivariate regressions would predict.

The player's benchmark performance is still significantly worse than GNUbg, except in Race ER where it is marginally better. Contact ER is much worse, which is the most important benchmark. It looks like there's still work to do on better inputs.

That said, I'm switching focus a bit to the cube handling side, so Player 3.3 will likely be my best cubeless player for a while.

Monday, March 12, 2012

The doubling cube and money games

Money games are not match games - in a money game you simply try to win as many points as possible. The doubling cube can be used in money games, doubling up to a maximum value of 64.

In some ways doubling strategy for matches is simpler than for money games because there is a well defined end point. But the standard methodology for cubeful money play ends up being much simpler than match play.

The standard approach was defined in a paper by Rick Janowski, where he proposed an approximation where you linearly interpolate cubeful equity between the "dead" and "live" cube limits.

The dead cube limit is one where you assume that after accepting a double, neither player can double again. The equity in this limit is straightforward.

The live cube limit is one where you assume an infinite number of possible redoubles, but (crucially) gameplay where the probability of win diffuses continuously, instead of jumping from step to step. The equity in this limit is slightly more complex but still has a nice general formulation.

The approximation for real cubeful play is that the cubeful equity is a linear interpolation between these two limits, using an "cube life index" as the interpolation variable that runs from 0 (the dead cube limit) to 1 (the live cube limit).

The cube life index is an input to the model and can be different for each player. Typical values are 0.6-0.7.

GNUbg uses this approximation itself, with a state-dependent cube life index. The backgammon iPhone app Backgammon NJ also uses the Janowski approximation.

Under this approximation, and given a cube life index for each player, there are simple linear forms for equity in the different cube states for each player, which can then be used to determine when to double (when your equity from doubling and giving the opponent the cube is greater than the equity from not doubling) and when to take vs pass (take if the equity from doubling and receiving the cube is greater than the amount you lose from passing).

The inputs to the model are the cube life index, plus the cubeless equities (eg from a neural network).

An alternative approach in the end game is to use two-sided bearoff databases - three of them. One corresponds to equity for a centered cube, one for equity assuming the player owns the cube, and one for equity assuming the opponent owns the cube. This gives exact cubeful equities, since the Janowski approximation can break down (in the sense that a fixed cube life index is not appropriate) near the end of the game.


Saturday, March 10, 2012

Match equity table: Crawford games

Crawford games happen in matches the first time a player gets within one game of winning the match. In the Crawford game the opponent cannot use the doubling die.

To compute match equity for these games, we start with the post-Crawford match equities and backward induct.

The Crawford game match equity table (for player 1-away and opponent n-away) I calculated is:



1-away
2-away
3-away
4-away
5-away
6-away
7-away
8-away
9-away
10-away
11-away
12-away
1-away
0
0.363
0.511
0.636
0.692
0.781
0.821
0.870
0.893
0.923
0.937
0.954

Here's how I calculated it:

Call Mc(n) the Crawford game match equity. By symmetry, Mc(1) must be 0. 

The doubling cube cannot be used in the Crawford game, so the more general match equity calculation is fairly straightforward. First, you decide the match equity conditional on different game end states:

Any player win: match equity = +1 (probability 50%)
Single player loss: match equity = Mp(n-2) (probability = 50% - gammon probability)
Single player gammon: match equity = Mp(n-4) (probability = gammon probability)

where Mp(n) = post-Crawford match equity for player 1-away and opponent n-away.

For terminological convenience, Mp(n) = -1 for n<=0. We ignore backgammons for this calculation.

Then the weighted Crawford match equity must be

Mc(n) = 1/2 + ( 1/2 - Pg ) Mp(n-2) + Pg Mp(n-4)

where Pg = probability of a gammon loss in a game as seen from before the first die throw, so symmetric (that is, the same as a the probability of a gammon win). Using 2-ply Player 3.2 I estimated this before as 13.735%.

One difference to note in the table above vs the table here (an example of a modern match equity table): for 1-away 3-away the match equity is greater than 0.5. That's because mine includes the proper post-Crawford match equity for 1-away 2-away, rather than approximating that as zero. 

Other differences for the Crawford game match equities are smaller than that equity error of 0.011; mostly around 0.008 or less. So ignoring the non-zero value of Mp(2) adds a relatively significant error. 

The biggest source of error when computing the Crawford match equities is on 1-away, 2-away (that is, Mc(2)), and is choosing an accurate gammon probability. This reference chooses a 20% chance that a cubeless game ends in a gammon, which is too low and gives too high a Mc(2) by as much as 0.04 points.


Match equity table: post-Crawford

I'm starting to look at match equity now: the expected points won in a match, including the cube actions (my first foray into doubling cube strategy). A certain win counts as match equity +1, and a certain loss, -1. In general, match equity equals twice the probability of match win less one.

A "match equity table" shows the probability of winning a match when two matched players are playing (so 50% chance of win on a regular game), for different states during the match play. The states are named by how many games each player is from winning the match: for example "2-away, 3-away" means the player is two games from winning the match and the opponent is three games away. That is, a match to 7 where the player has 5 points and the opponent has 4 is no different from a match to 11 where the player has 9 points and the opponent has 8.

Match equities in a match equity table correspond to match equities before the first die is thrown in a game - so before the players even know who plays first.

Match equity tables assume the Crawford rule is in force. The Crawford rule is common to match play and states that the first time one of the players is one game away from a match win, the opponent cannot double for that game. After the "Crawford" game is over, the opponent is free to double again (and indeed will, at first opportunity).

Two references for match equity tables I've found which are quite illuminating: one that describes the algorithm for generating a match equity table (which I summarize below, with a few additions); and one that shows a modern match equity table based on GNUbg rollouts (though it shows match-winning probabilities instead of equities - an easy conversion).

The general approach to building the match equity table is backward induction: start with late-match known states and step backward in the match, weighting later match states by the probability of reaching them.

The first place to start is "post-Crawford" match equities. These correspond to late-match states after the Crawford game has been played: the player is 1-away, the opponent is n-away, and the opponent will double at first opportunity.

Here is my post-Crawford match equity table out to 12-away


1-away2-away3-away4-away5-away6-away 7-away8-away9-away10-away11-away12-away
1-away00.0310.3630.3910.6270.6480.775 0.7900.8660.8760.9200.926

Here's how I calculated it:

Call the post-Crawford match equity Mp(n), where n is the number of games the opponent is away from a win. We can backward-induct these as well.

Mp(1) is where both players are 1-away. In this case both players are equally likely to win the game (remember, the equity corresponds to right before the first die is thrown, and we assume both players are equally good). So the match equity in this state is zero: Mp(1) = 0.

Mp(2) is a bit trickier. The opponent will double at first opportunity, and the player will take (accept the doubling cube) if the odds are in his favor, but pass (turn it away and give up the game) if not, since a pass just means they start again at 1-away, 1-away with even odds. If the player takes the cube he cannot double again since he's already 1-away.

This means Mp(2) is close to zero, but not zero exactly, since the opponent can double only after a turn or two, and the first one or two rolls will change the odds. That is, if the player wins the first roll, he makes a move, then the opponent can double. If the opponent wins the first roll, he makes a move, then the player makes a move, and only then can the opponent offer the double.

We can calculate that using our cubeless player. If the player wins the first roll, there are fifteen possible states, with a probability of 1/30 of reaching each (15 rolls, and 50% chance that he plays first). For each of those fifteen rolls we use the player to calculate the best move, as well as the odds of winning a game after the move. We care only about the probability of any win, since a regular win, gammon, or backgammon all win the match for either player.

If the opponent wins the first roll we have to step two rolls deep into the game. We have a probability of either 1/30/18 or 1/30/36 for each state, depending on whether the first roll is mixed or a double. Again, we use the cubeless player to both make the optimal moves and tell us the game-winning probability after each move.

One caveat: the player's estimates of probabilities are only approximate, so when you use the states defined above and calculate the probability of a player win by weighting the game-winning probability in a state by the probability of reaching that state, you do not get exactly 50% (like you should). Generally it is close, but could be off by a few tenths of a percent. Mp(2) is going to be sensitive to small differences from 50% in probabilities, so this could be a significant bias.

I decided to normalize for this by adding a constant offset to all game-winning probabilities in the two-deep states (where the opponent wins the first roll) such that after the offset, the calculated probability of the player winning is exactly 50%.

Then I calculated Mp(2) as the expected match equity over all those states (15 when the player wins the first roll and 15*21=315 when the opponent wins the first roll, for a total of 330). In each state I calculate the match equity if the player takes the cube (2*probability of win-1); if that is greater than zero (the match equity if the player passes on the cube) the match equity in the state is the calculated match equity; otherwise it is zero. I calculate the weighted sum of that match equity to get Mp(2).

Using 2-ply Player 3.2 to estimate the moves & game-winning probabilities in each state, I get Mp(2) = 0.0310. This ties out well with the value in the second reference (rounded to 0.03).

Mp(3) is easier. When the player is doubled he'll always take the cube because the take match equity is always much higher than the pass match equity (which equals Mp(2), since a pass gives the opponent a game and takes him from 3-away to 2-away). The match equity after the game depends on who wins what. If the player wins any kind of game, he wins the match, for a match equity of +1. That happens 50% of the time since the players are symmetric.

When the player loses a single game, doubled, the match ends up at 1-away, 1-away, with a match equity of zero (Mp(1) above). If the player loses a gammon (or backgammon), the game is over and the match equity is -1.

So Mp(3) = +1 * 0.5 + 0 * ( 0.5 - Pg ) - 1 * Pg = 0.5 - Pg

where Pg is the probability that the player loses a gammon.

To calculate this we need an estimate of the probability of losing (or winning, as it's symmetric before the first die is thrown in a game) a gammon. The standard when generating match equity tables is just to specify this a priori; the first reference uses 20% and the second 26%. We can do a little better since we already have the bot estimates of all the network probability in the 330 possible states before a double; so we can calculate it. Using 2-ply Player 3.2 I get a gammon probability of 27.47% - this represents the probability that a game ends in a gammon. This is a cubeless estimate, but that's fine, since after the opponent (immediately) doubles and the player takes, the cube is dead.

Plugging in Pg = 0.2747/2 = 0.13735, Mp(3) = 0.363.

Mp(4) is calculated like Mp(2), except that the match equity in the "take" states is slightly different. If the player wins any kind of game the match equity is still +1; if he loses a single game the match equity is Mp(2); if he loses a gammon (or backgammon) the match equity is -1. As with Mp(2), the probabilities here (wins, gammon losses, etc) are specific to each state and estimated by the bot. At each point you compare that state-dependent "take" equity with the pass equity (Mp(3) in this case).

A similar algorithm is used for Mp(n) for n>4: for each state calculate the state's match equity as the maximum of the state-dependent "take" equity and the state-independent "pass" equity, and then average over all states.

Friday, March 2, 2012

FANN neural network package

Quite a nice neural network package that I'm starting to investigate: FANN.

Looks like it's been carefully optimized, has been around for a while, and is pretty easy to use. I've cored out the hand-rolled networks I was using before and am starting to look at performance benefits of using a proper optimized network package.

It also opens the door to more complex training algos than the standard backpropogation that I've been using so far, since those come in the FANN box. Probably most useful in the supervised learning version of training.