Monday, August 20, 2012

Improved GNUbg benchmarks

The GNUbg team (in particular, Philippe Michel) has created new benchmark databases for Contact, Crashed, and Race layouts, using the same set of board positions but rolling out the equities with more accuracy. This corrects the significant errors found in the old Crashed benchmark, and improves the Contact and Race benchmarks.

They are available for download here, in the benchmarks subdirectory.

Philippe also did some work on improving the board positions included in the Crashed training database, which is available for download in the training_data subdirectory at that link.

I re-ran the statistics for several of my players, as well as for PubEval. Also Player 3.6 as the most comprehensive benchmark.

Player
GNUbg Contact ER
GNUbg Crashed ER
GNUgb Race ER
PubEval Avg Ppg
Player 3.6 Avg Ppg
GNUbg
10.5
5.89
0.643
0.63
N/A
12.7
9.17
0.817
0.601
0.0
13.1
9.46
0.817
0.597
-0.0027
13.4
9.63
0.817
0.596
-0.0119
13.4
9.89
0.985
0.595
-0.0127
14.1
10.7
2.14
0.577
-0.041
33.7
26.2
2.45
0.140
-0.466
18.2
21.7
2.05
0.484
-0.105
21.6
23.2
5.54
0.438
-0.173
41.7
50.5
2.12
0.048
-0.532
44.2
51.3
3.61
0
-0.589

For the games against PubEval I ran 40k cubeless money games; standard errors are +/- 0.006ppg. Down to Player 3.2, for the games against Player 3.6 I ran 400k cubeless money games to get more accuracy; standard errors are +/- 0.002ppg or better. For players worse than Player 3.2 I played 100k games against Player 3.6 as the average scores were larger; standard errors are +/- 0.004ppg.

Phillippe Michel was gracious enough to provide the GNUbg 0-ply scores against the newly-created benchmarks. Also it seems like I had the scores against the old benchmarks incorrect: they were Contact 10.4, Crashed 7.72, and Race 0.589. The Contact score was close, but the other two I had significantly worse.



Sunday, August 19, 2012

Player 3.6: longer training results

I haven't had much time lately to work on this project, but while I'm engaged elsewhere I thought I'd run training for a long period and see whether it continued to improve.

It did - fairly significantly. So my players before clearly were not fully converged.

The result is my new best player, Player 3.6. Its GNUbg benchmark scores are Contact 12.7, Crashed 11.1, and Race 0.766. In 400k cubeless money games against Player 3.5 it averages 0.0027ppg +/- 0.0018 ppg, a small improvement.

In 40k games against Benchmark 2 it averages 0.181 +/- 0.005 ppg, and against PubEval 0.601 +/- 0.006 ppg.

For training I used supervised learning with three parallel and independent streams: one with alpha=0.01, one with alpha=0.03, and finally one with alpha=0.1. This was meant to test the optimal alpha to use.

Surprisingly, alpha=0.01 was not the best level to use. alpha=0.03 improved almost 3x as quickly. alpha=0.1 did not improve well on the Contact benchmark score but did improve the best for the Crashed benchmark score.

I take from this that alpha=0.03 is the best level of alpha to use for long term convergence.

The Crashed benchmark score we know is not that useful: the Crashed benchmark itself is flawed, and a multi-linear regression showed very little impact on score of the Crashed benchmark. That said, I tried a little experiment where I used the Contact network for crashed positions in Player 3.5 and it definitely worsened performance in self-play: 0.04ppg on average. That is a significant margin at this point in the player development.

I ran 4,000 supervised learning steps in the training, for each of the three alpha levels. In each step I training on a randomly-arranged set of Contact and Crashed training positions from the training benchmark databases. This took a month and a half. The benchmark scores were still slowly improving for alpha=0.01 and alpha=0.03, so there is still scope for improvement. I stopped just because the GNUbg folks have put out new benchmark and training databases that I want to switch to.

Tuesday, June 12, 2012

GNUbg Crashed benchmark issue

It looks like the Crashed benchmark set in the GNUbg benchmarks isn't very accurate in some cases.

There is a thread discussing it in the GNUbg mailing list.

Interesting to know, and hopefully the GNUbg team will improve on it; but the Crashed benchmark score is not very predictive for overall gameplay, as I've discovered while comparing players of different strengths.


Monday, May 14, 2012

Player 3.5: new input, escapes from the bar

I tried another new input for contact and crashed networks: this time, the expected escapes if you have a single checker on the bar. That is, looking at the available spaces in the opponent home board and weighting the probability of landing in the space with the standard escape count from the Berliner primes calculation. It is meant to give some indication of how good or bad it'd be to get hit. I'm focusing on inputs along these lines because when looking at which positions are calculated most poorly in the benchmarks, it tends to be boards where there is a significant chance of being hit and landing behind a prime.

This one had some success, and while the improvement is still incremental, it resulted in my best player to date. The resulting player that uses the new input is Player 3.5. It is identical to Player 3.4, except for two new inputs: the input as described above, one for each player.

Its GNUbg benchmark scores are Contact 13.0, Crashed 11.5, and Race 0.766. Player 3.4's scores are 13.3, 11.7, and 0.766, so noticeably better but still nothing dramatic (though notably some improvement in Contact, the most important benchmark). It seems that to get a significantly stronger player I'll have to add a bunch of inputs, each of which offers reasonably incremental benefits.

In cubeless money player against Player 3.4, it scores an average +0.0033ppg +/- 0.0021ppg in 400k games. Against PubEval it scores an average +0.592ppg +/- 0.005ppg in 100k games and wins 69.5% of the games.

Still not nearly as good as GNUbg 0-ply! But creeping closer.

To be honest I'm not really sure whether the improved performance came because of the new input or because I slightly changed the training algorithm. In this case I started with random weights for the new inputs and ran supervised learning against the GNUbg training databases (contact & crashed). And instead of bouncing back and forth between a large alpha (1) and smaller alphas, I just used a small and constant alpha of 0.03. The resulting benchmark score slowly improved over 1,100 iterations, which took several days to run.

Friday, April 27, 2012

New inputs failure: bar hit/entry probability

I've been spending a little time looking at cases where my Player 3.4 does poorly in the GNUbg contact benchmarks database, to get some feel for what new inputs I might try.

It looks it's leaving blots too often when the opponent has a good prime blocking the way out of the home board.

So I tried two new inputs: the odds of entering the opponent's home board if there were a checker on the bar; and the odds of hitting an opponent blot in his home board if there were a checker on the bar.

I tried two training approaches: first, adding random weights for just those four new weights (the two inputs times two players) and doing supervised learning on the GNUbg training databases; and also starting from scratch, random weights everywhere, and doing TD training through self-play and then SL on the GNUbg training databases.

The conclusion: neither worked. In both cases the new player was about the same as or a little worse than Player 3.4. So these aren't the right new inputs to add.

Back to the drawing board.

Wednesday, April 25, 2012

Jump model final version posted

I've posted a new version of my jump model for cube decision points:

http://arxiv.org/abs/1203.5692

This version is quite similar to the last one, with just a few elaborations added after another set of very productive discussions with Rick Janowski. He's been a huge help in pulling this together.

I doubt this paper will change again, though I'll probably revisit the model in the future with another paper. Probably to focus on how to estimate the local jump volatility accurately.

Tuesday, April 17, 2012

PubEval trained using ListNet

I spent some time looking at PubEval again - not my implementation, which is fine now, but rather how Tesauro came up with it in the first place. One discussion suggests that he trained it using "comparison training", a machine learning approach he seems to have come up with - some kind of supervised learning on a manually-prepared set of benchmarks. Each benchmark (I'm assuming) was a list of moves given a starting point and a dice roll, where the moves were ordered by goodness.

I tried to reproduce this. I couldn't find any proper references to "comparison training", but there's a lot of relatively recent literature on machine learning algorithms for generating rankings, which is the same sort of thing.

We can do a lot better than Tesauro's hand crafted training set: we have the GNUbg benchmark databases that are much larger and more accurate.

So what we want is an algorithm that we can feed a training set, where each element of the set has the five boards listed for each move and the rolled-out equities for each. The inputs that define the board are the PubEval inputs, and the evaluation function should be a linear function of the inputs (like PubEval is).

Wikipedia has a nice summary of different machine learning approaches to ranking estimators.

The ListNet algorithm seems like a good choice. I implemented it and trained it on the GNUbg contact and race benchmark databases.

It fairly quickly converged to a better solution than Tesauro's original PubEval. That is, the weights I found can be plugged into the usual PubEval setup, but give a slightly stronger player. Not much better, but noticeably stronger. Not surprising given the more comprehensive training set.

The weights themselves, and the output values, were quite different to PubEval. The ListNet algorithm effectively trained the regression to approximate equity, so in this approach the board evaluations correspond to equity (though optimized for equity differences on similar boards rather than outright equity).

The GNUbg benchmark scores were: Contact 41.5, Crashed 47.5, and Race 1.90. This compares to PubEval's scores of 44.1, 49.7, and 3.54.

The weights are available on my Dropbox account: race and contact.

In 100k cubeless games (with variance reduction) against PubEval it scores +0.043ppg +/- 0.004ppg. Again, noticeably better.

Of course this is a terrible player compared to neural network players, but it's neat to be able to reproduce something like what Tesauro did with PubEval. And it was a good excuse to play around with the newer machine learning algorithms focused on ranking.

As well this might be an interesting training approach for a neural network. The network would be optimized for checker play, so would be less efficient at absolute equity estimation required for doubling decisions. But perhaps one could have two sets of networks, one optimized for checker play, the other for doubling decisions.

Tuesday, April 10, 2012

Average number of games in a match

Some more interesting statistics on match play: the average number of games per match.

I ran self-play for two players using Player 3.4 for checker play (cubeful equity optimization) and Janowski match model with cube life index 0.70 and looked at how many games it took on average to finish a match.

The most interesting result is that the average number of games divided by the match length seems to converge reasonably well to a value around 0.65.

Here is a chart of the results: blue line/left axis shows average number of games, red line/right axis shows the average number of games divided by the match length:


The only other source I could find for similar statistics gave similar results (out to match length 11).

I ran the experiment with x=0.5 and x=0.9 as well to see how that affected the converged average number of games per match length. x=0.5 gave a converged ratio of 0.51; x=0.9 gave 0.90. This compares to 0.65 for x=0.7.

So the average number of games per match divided by the match length converges (for long matches) to something surprisingly close to the cube life index itself! I'm sure this is just a coincidence, but an interesting one.

Optimal Janowski cube life index in match play

In my last post I looked at extending Janowski's cubeful equity model to match play.

The conclusion: match play also favors a cube life index very close to 0.70.

I played a Janowski match strategy with different cube life indexes against a Janowski match strategy with cube life index 0.70 as a benchmark. I ran 40k matches with variance reduction and recorded the average points per match.

The results:

Match
Length
x=0.5x=0.6x=0.8x=0.9
3-0.0040.000-0.004-0.007
5-0.007+0.007+0.002-0.024
7-0.021-0.003-0.008-0.029
9-0.028-0.006+0.003-0.037

If I fit parabolas through the results and force them to pass through zero ppm at x=0.7, I find optimal cube life indexes of x=0.61 for match length 3, x=0.64 for match length 5, x=0.68 for match length 7, and x=0.69 for match length 9.

All average points per match have a standard error of +/- 0.004ppm, so the statistics are marginal for the shorter match lengths.

There is some evidence for a smaller cube life index for shorter matches, but not much. In general the optimal match cube life index looks very close to the optimal money cube life index.

UPDATE: I ran longer simulations for more values of the cube life index for match lengths 3, 5, and 7 to try to get more accurate statistics. From those data I get optimal cube life indexes of 0.70, 0.67, and 0.69 for match lengths 3, 5, and 7 respectively. So no evidence of a smaller optimal cube life index for shorter matches: everything should use 0.70.

That said, the performance difference for short matches of using a suboptimal cube life index is pretty infinitesimal. It becomes a bigger deal for longer matches.

Monday, April 9, 2012

Janowski model extended to match play

When I was looking at match equity tables I wondered whether you could extend Janowski's model (using a "cube life index" to interpolate between dead and live cube equities as a proxy for the jumpiness of game state) to match play. I'm pretty sure this is what GNUbg does based on their docs.

Turns out it's pretty straightforward if you assume the same match equity table as you calculate with Tom Keith's algorithm, which is a live cube model - it assumes game-winning probability changes diffusively, and that W and L (the conditional points won on win or lost on loss) are independent of the game-winning probability. That's the same as Janowski's live cube limit, except W and L are calculated from entries in the match equity table instead of the usual money scores for wins, gammons, and backgammons.

The Keith algorithm mentioned before gives you the cubeful match equity in the live cube limit. The dead cube limit has cubeful equity that's linear in P, running from -L at P=0 to +W at P=1. The model cubeful equity is just the weighted sum of the live and dead cube cubeful equities.

I implemented this to see how it compares to using a Janowski money model for doubling decisions in tournament play. That's of course inefficient - it doesn't account for match equity - but it's an interesting benchmark to show how much a match-aware doubling strategy matters.

Checker play was Player 3.4 optimizing on cubeful equity, and I ran 40k matches (with variance reduction) for a range of match lengths and cube life indexes. For a given cube life index, both the match and money doubling strategies share the same cube life index, which seems a fair comparison. Remember, for money play, a cube life index of x=0.70 was optimal.

The entries in the table are average points per match in the 40k matches. All values have a standard error of +/- 0.005ppm.

Match
Length
x=0x=0.25x=0.50x=0.75x=1
3+0.056+0.047+0.051+0.066+0.054
5+0.073+0.080+0.091+0.105+0.067
7+0.115+0.138+0.137+0.142+0.061
9+0.115+0.148+0.150+0.154+0.081

The main conclusion here is that using a proper match-aware doubling strategy makes a huge difference in outcome. The impact is larger for longer matches because any per-game edge gets magnified over a match. In principle for very long matches the match strategy should become the money strategy, but for matches up to length 9, anyways, that is not apparent in the statistics.

This assumes the same match equity table as before, which is based on the live cube model. Really I should recalculate the match equity table assuming the same Janowski model for cube decisions, which I think will change it a bit. But I'll leave that for another day.

Or I could extend my jump model to match play - it should be a relatively simple extension, since like with Janowski it's just about changing W and L to be based off match equities.

Saturday, April 7, 2012

Cubeful vs cubeless equities in checker play decisions

So far all my players have made checker play decisions by choosing the position with the best cubeless equity, even after I introduced doubling strategies to decide when to offer & take doubles. This is clearly suboptimal: the doubling strategies can calculate cubeful equity, so the checker play should choose moves based on the position with the best cubeful equity, not best cubeless equity.

I'm referring to money games; for match play they should optimize match equity.

I extended my backgammon framework to allow the players to optimize on cubeful equity in checker play to see how much that would improve things.

The answer, it seems: at least for my intermediate-strength players, not much!

I played 100k cubeful money games (with variance reduction) between two players (both Player 3.4) that both use Janowski x=0.7 for their doubling strategy. The first used cubeful equities for checker play and the second used cubeless equity.

The first player scored -0.0004ppg +/- 0.0044ppg. Hardly a significant advantage to using cubeful equity when choosing the best move - the score is indistinguishable from zero.

I also thought I'd run the cubeful-equity player through the GNUbg benchmarks. The benchmarks are for cubeless play, so that should give something worse than for the cubeless-equity player. But the benchmarks are all about choosing the best move out of the possible moves, so all that matters there is relative equity difference, not absolute equity, and perhaps that's not affected much by cubeful vs cubeless.

The cubeless benchmark scores are Contact 13.32, Crashed 11.69, and Race 0.766.

Using a cubeful-equity player, always assuming a centered cube, the scores were 13.44, 12.16, and 0.766. So the Contact and Crashed scores got a little bit worse, but hardly changed. Using cubeful equity in checker play decisions does not seem to make a big difference in almost every case.

Assuming player-owned cube the scores were 13.41, 12.03, and 0.768. Assuming opponent-owned cube the scores were 13.42, 11.82, and 0.764. So cube position does not matter much either.

The conclusion: it doesn't matter much if you use cubeful or cubeless equity in checker play decisions.

Nonetheless the cubeful equity performance is marginally better, and there are probably edge case conditions where it matters more, so I'll start using cubeful equity in money play.

Friday, April 6, 2012

Player 3.4: incrementally better

I let the supervised learning algorithm train for a while longer, starting with the Player 3.3 networks (i.e. the same networks and inputs).

Its benchmark scores were Contact 13.3, Crashed 11.7, and Race 0.766, so my best player yet, but only incrementally better than Player 3.3's scores of 13.3, 12.0, and 0.93.

On the most important benchmark, Contact, it was unchanged. And in self-play against Player 3.3 (100k games with variance reduction) its score was zero within a standard error of 0.0009ppg. Not exactly a startling improvement!

Nonetheless it is measurably better in the other GNUbg benchmarks so I'll start using it as my best player.

Playing 100k cubeless money games against PubEval it scored +0.586ppg +/- 0.001ppg, winning 69.22% of the games; Player 3.3 in the same games scores +0.585ppg +/- 0.001ppg and wins 69.20% of the games. So again very little difference, but still incrementally my best player.


Self-play variance reduction

I've done a lot of bot self-play to gather statistics on relative strength and so on.

One obvious way to reduce variance in the results is by running half the results with player 1 vs player 2, then half the results with player 2 vs player 1, but using the same random seeds as with the first half. That is, in both halves the dice are the same, but the players alternate taking them. That significantly reduces the impact of dice luck. In the limit of both players being exactly equal in terms of strategy this will return exactly 0 points per game and 50% wins, so perfect variance reduction.

Note that only half as many independent trials are used; that tends to make the standard error worse. But the variance reduction should more than compensate.

I haven't done this before, which is a little embarrassing, honestly.

But, now that I've thought of it, I figured I'd do a little statistical analysis to investigate how effective the variance reduction actually is.

The first example: playing 10k self-play money games where both players use Player 3.3 for checker play, the first player uses a Janowski doubling strategy with x=0.7, and the second player uses Janowski with x=0.6. In my previous experiments where I ran 1M money games with no variance reduction I got an average score of +0.018ppg with a standard error of 0.003ppg.

No variance reduction: +0.017ppg +/- 0.032ppg
Variance reduction: +0.026 +/- 0.014ppg

Both are consistent with the more accurate estimate, but the standard error with variance reduction was 2.3x smaller. That corresponds to the standard error when running 2.3^2=5.3x as many simulated games with no variance reduction, so saves over a factor of 5 in run time.

The second example: playing 10k self-play cubeless money games, Player 3.3 against PubEval.

No variance reduction: +0.610ppg +/- 0.014ppg
Variance reduction: +0.583 +/- 0.012ppg

In this case the variance reduction doesn't help much. That's likely because the two strategies are sufficiently different that they make different moves, and the same dice throws applied to different positions imply different luck.

So when two players are comparable the variance reduction is quite powerful. When they imply significantly different checker play the technique is less powerful, but still worth doing.

Thursday, April 5, 2012

Jump model revisited

After conversations with Rick Janowski I realized that I'd approached the jump model incorrectly. I've since revamped the whole thing and reposted the corrected paper on arxiv.org - the same link as before, overwriting the old (incorrect) version.

I feel pretty confident about this second cut. The implied Janowski cube life index is right around the optimal value I found, and jump volatility found by optimizing the model parameter in bot self-play ties out well with statistical estimates of jump volatility.

I put the python code I used to generate cubeful equities onto github, released under the GPL. It contains functions for calculating cubeful equity in the three cube states (player-owned cube, unavailable cube, and centered cube). Three methods for calculating the equity: numerically, solving the exact model; a linear approximation; and a nonlinear approximation that improves on the linear one. They are all developed in the paper.

Cube Handling In Backgammon Money Games Under a Jump Model


A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Each jump is drawn from a distribution with zero mean and an expected absolute jump size called the “jump volatility” that can be a function of game state but is assumed to be small compared to the market window.

Closed form approximations for cubeful equities and cube decision points are developed as a function of local and average jump volatility. The local jump volatility can be calculated for specific game states, leading to crisper doubling decisions.

Friday, March 30, 2012

Optimal Janowski cube life index revisited

After fixing my implementation of Janowski's doubling strategy I tried again to find the optimal cube life index (assuming we just use a single constant index).

As before, I found the optimal cube life index is 0.7.

To test this I ran doubling strategies with different cube life indexes against one with a fixed index of 0.7 to see if any other index would outperform. For indexes near 0.7 I ran 1M self-play games, using Player 3.3 for checker play. For indexes a bit further out I used 100k self-play games.

Here are the results:


Two interesting points to note. First is that the probability of win continues to increase to x=0.95 even though the average score decreases, due to the asymmetric market windows of the two players. Second is that there is a fairly dramatic falloff in performance (both in score and probability of win) for x>0.95.

Wednesday, March 28, 2012

Janowski model revisited

I was a bit incorrect in how I interpreted Janowski's paper.

The real way to think about his model is as a linear interpolation between the dead cube equity and the live cube equity.

The dead cube equity is always P*(W+L)-L, normalized to the current cube value. The live cube equity is piecewise linear, and has different forms for three cases:

1) Cube centered. 0<P<TP, equity runs from -L to -1; TP<P<CP, equity runs from -1 to 1; CP<P<1, equity runs from 1 to +W.
2) Cube owned by player. 0<P<CP, equity runs from -L to +1; CP<P<1, equity runs from +1 to +W.
3) Cube owned by opponent. 0<P<TP, equity runs from -L to -1; TP<P<1, equity runs from -1 to +W.

TP and CP here are the live cube take and cash points, so

TP = (L-1/2)/(W+L+1/2)
CP = (L+1)/(W+L+1/2)

Then the Janowski model equity is x * live cube equity + (1-x) * dead cube equity.

That's it. Everything else falls out of that, and in practice you don't really need to calculate initial double, redouble, too good points, or even take and cash points - you just compare cubeful equities.

For example, the take decision calculates the equity when the player owns the doubled cube and compares that to -(cube value). If the equity on take is greater than the equity on pass, you take.

Similarly, the double decision calculates the equity on double (which is the minimum of the equity when the opponent owns the doubled cube and +(cube value) for the pass) and compares it to the equity at the current cube value when the player owns the cube. If the doubled equity is bigger, you double.

My error before was around too-good points: in my implementation I assumed that the too-good point was calculated by comparing the dead cube equity with pass value, thinking that you're giving up the value of the cube by not doubling. But really you're not; you still have the option of doubling if the game starts to go against you. You should be comparing the model cubeful equity at the current cube against the value you'd get on a double (the minimum of the take and pass equities there).

This invalidates some of the statistical results I calculated for Janowski's model before, so I'll have to redo those.

Also it makes it easy to generalize to two cube life indexes: each player's interpolation of equity uses their appropriate index.

Jump model performance

I implemented a doubling strategy following my jump model and ran it against the optimal Janowski doubling strategy. All checker play was Player 3.3.

This first implementation assumes a constant jump volatility, so does not make any attempt to calculate a local jump volatility in the game state. It's meant to be a benchmark to compare against the Janowski strategy with a single cube life index, which should be pretty similar in performance.

I ran one million cubeless money games of the jump model with different jump volatilities, vs the Janowski strategy with the optimal single cube life index of 0.7. The goal was to see which jump volatility value would give the best performance.

In the paper I made an estimate of a constant jump volatility of 7.7% based on a calculation of the average absolute probability change from turn to turn when the cubeless win probability was near the take and cash points. That was only an approximate calculation, based on crude windows of win probability, but should be roughly what comes out of this more careful analysis.

Here are the results of the games:


The blue line/points represent the self-play results for average points per game for the different jump volatility inputs, running from 5% to 11%. Each has an error bar of +/- 0.004ppg which represents the standard error on the averages. The black line shows a parabolic fit through the points. The red line/points represent the probability of win for the jump model strategy.

The jump volatility with the best ppg performance was 8.0%, quite close to the naive estimate from cubeless play statistics of 7.7%. But a jump volatility anywhere in the range 7.0% to 9.0% performed quite comparably. I fit a parabola to the ppg data and that suggests an optimal jump volatility of 8.5%, which is probably a more realistic number given the statistical uncertainty on individual points.

The jump model with constant jump volatility performed marginally better than the Janowski model with a single cube life index, though the margins were small: a few hundredths of a point per game, in games with an average cube value of around 2.3. The differences are statistically significant (the standard error for all ppg numbers is +/- 0.004ppg) but only because I ran 1M games; really the performance is practically indistinguishable.

That is roughly what one would expect given that the model with a constant jump volatility and constant conditional win and loss amounts W and L can be mapped directly onto Janowski's model with a single cube life index. Of course W and L are not actually constant, so the models are not practically the same, but they should be close.

That said, the probability of win showed a much larger effect than points per game. A jump model strategy with a jump volatility of 5% won 50.2% of its games (while losing 0.006ppg in score); a strategy with jump volatility of 11% won only 44.2% of its games (but won 0.011ppg in score).

High jump volatility means the player chooses to double at lower probability of win and pass at higher probability of win. So the games it loses because of passing a double are offset by games it wins when doubled.



Monday, March 26, 2012

New cube decision model posted on arxiv.org

I posted the next draft of my new model for calculating cube decision points to the preprint site arxiv.org, so now it has a proper stable link:

Cube Handling In Backgammon Money Games Under a Jump Model

Abstract

A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Each jump is drawn from a distribution with zero mean and an expected absolute jump size called the “jump volatility” that can be a function of game state but is assumed to be small compared to the market window.

Simple closed form approximations for the market window and doubling points are developed as a function of local and average jump volatility. The local jump volatility can be calculated for specific game states, leading to crisper doubling decisions.

Checkpoint: recent progress and next steps

Since my last checkpoint around 2m ago I've made quite a lot of progress.

Main highlights:
  • Networks: I added a crashed network following the GNUbg definition of "crashed".
  • Network inputs: I added a new input to the contact and crashed networks that measures strength of prime blocking. I also extended the race inputs to more granularly track the checkers born in.
  • Training: I grabbed the GNUbg training databases for contact, crashed, and race networks, and ran supervised learning against them. The best performance is first trained using TD learning, then uses those network weights as the starting point for supervised learning.
  • Benchmarking: I moved from bot self-play performance to benchmarking against the GNUbg benchmark databases for cubeless play, since that gives a more accurate and robust set of benchmarks for different game phases. I ran some stats to look at the relationship between self-play results and GNU benchmark scores.
  • Cube handling: this was a new area for me since my personal game is weak on cube play. I learned about Janowski's classic model for money games, as well as how to construct match equity tables for tournament play.
  • New model for money game cube action: I developed a new model for cube decisions in money games, moving from the "live cube" limit where probability of win is assumed to diffuse, to a jump model where the probability of win jumps each turn.
Along the way I developed several new players incorporating the new ideas. As of now my best checker player is Player 3.3, which scores GNUbg benchmark scores of Contact 13.3, Crashed 12.0, and Race 0.93 (compared to GNUbg 0-ply scores 10.5, 11.0, and 1.01).

Stuff still to do:
  • Money play: implement my new model properly and test out its performance. Hopefully it should perform better than the standard approaches.
  • Match play: implement matches properly in my backgammon framework and see if I can extend my new model approach to match play as well (coming up with better match equity tables along the way).
  • Checker play: my players' Contact benchmark scores are still pretty weak compared to GNUbg. I suspect I need to add more custom inputs. I also want to investigate some more advanced supervised learning techniques applied to training against the GNUbg training databases.
  • Rollouts: rollouts are still pretty basic in my framework; I need to properly implement variance reduction and cutting off rollouts before the end of the game.
  • FIBS hookup: I still want to hook up my bot to FIBS and see how it performs there in real-world games.
  • Networks: I've been talking with Øystein Johansen about maybe breaking up the "contact" specification into separate nets for different game phases, perhaps using k-mean clustering to define the phases.
Fun stuff!

Saturday, March 24, 2012

New model paper v2

Here's the second draft of the paper, revamped considerably:

The main changes:
  • Added local and average jump volatility. Take and cash points depend only on the average volatility since on a pass/take decision you're comparing post-double equity with -1. The post-double equity depends on the jump volatility at the cash point; but should the game turn and head back up there it'll probably be in quite a different state than it's in near the take point. So using the average jump volatility seems more reasonable. Local jump volatility impacts the cubeful equity at the current cube level, which in turn impacts doubling decisions.
  • I note that if I assume a constant (single) jump volatility, it maps very nicely onto Janowski's model with one x. But the local + average jump volatility version does not map as nicely onto his model extension to a cube life index for each player, x1 and x2.
  • I changed the definition of jump volatility to be the expected absolute jump size instead of jump standard deviation. That's what the equity calcs care about. In fact, I only need to assume the jump distribution is symmetric for positive and negative jumps; if that is true then I don't need to know anything more about the jump distribution than its expected absolute jump size. Pretty nice! I got caught up for a while playing around with different distributional assumptions before I cottoned onto that.
  • I rewrote Appendix 3, the one that works through the details of the jump model, to try to make it more clear when you should use local vs average volatility. That I suspect is the subtlest part of this. I think I understand this properly now, but I went through a couple of iterations myself before I got here, so it could be that I'm still getting something wrong.
This is on the back of a very rewarding conversation with Rick Janowski, who pointed out how I needed to rethink this in terms of something more like his model where he has a different cube life index for both players.


Wednesday, March 21, 2012

Optimal Janowski cube life index

I tried to determine the optimal constant Janowski cube life index by playing players with different cube life indexes against each other.

I started broadly by playing different cube life indexes against a fixed 0.7, since 0.7 is near where other people have quoted the cube life index.

Statisticx=0x=0.25x=0.50x=0.75x=1
Average points per game-0.30-0.14-0.040.00-0.09
Probability of win37.541.845.951.258.3
Average cube2.322.402.271.961.45

Interesting that the probability of win was well above 50% but on average lost points for x=1. It must be because when it wins it wins small, but when it loses it loses big. Presumably that's because the player gets doubled and takes when the game goes against it, but only doubles past the cash point when the game goes in the player's favor.

Next I tried to zero in on the winning range by running the same thing in a tighter range:

Statistic
x=0.55
x=0.60
x=0.65
x=0.70
x=0.75
Average points per game
-0.03
-0.01
0.00
0.00
0.00
Probability of win
46.9
47.8
48.9
50.0
51.2
Average cube
2.23
2.17
2.11
2.04
1.96

So it seems like 0.7 is indeed the optimal cube life index in terms of expected points per game, at least within +/- 0.05.

Janowski-model money game cube statistics

I've got a working doubling strategy in my backgammon framework now for money games, applying Janowski's model.

Now I can calculate some statistics on the values of the cube; kind of like an analysis on Jellyfish doubling that was done in 1999.

I'm interested in how doubling statistics change as Janowski's cube life index (sometimes called cube efficiency) varies. I ran 100k cubeful money games, using Player 3.3 for checker play, and Janowski's doubling model for different cube life index values.

Statisticx=0x=0.25x=0.50x=0.75x=1
Percent cashed21.827.440.050.996.3
Percent single55.451.245.934.80.4
Percent gammon21.820.318.113.63.1
Percent backgammon1.21.11.00.70.2
Average cube14.74.492.721.891
Percent cube=10.63.212.333.9100
Percent cube=224.247.861.056.70
Percent cube=424.431.321.28.50
Percent cube=818.612.24.50.80
Percent cube=1612.84.10.800
Percent cube=328.21.10.100
Percent cube=6411.10.4000

In the case of x=1 (the live cube limit) the initial double point is right at the cash point, so the player never offers a double when the opponent will take. Most games then end in cashes.



Tuesday, March 20, 2012

New model for money game cubeful equity

I came up with a new model for estimating cubeful equity in backgammon money games that seems like an interesting way to think about the Janowski cube life index in specific game states:

Abstract

A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Jumps occur when a Poisson process fires, and each jump is drawn from a normal distribution with zero mean and a standard deviation called the “jump volatility” that can be a function of win probability but is assumed to be small compared to the market window.

Simple closed form approximations for the market window and doubling points are developed as a function of local jump volatility. The jump volatility can be calculated for specific game states, leading to crisper doubling decisions.

All cube decision points under this model match Janowski’s if his cube life index is implied from this model’s take point, so these results can be viewed as a framework for estimating the Janowski cube life index more accurately for specific game states.

Thursday, March 15, 2012

Match equity table with MWC

Most people show match-winning chance (MWC) in their match equity table instead of equity. Of course the standard conversion of equity = 2 MWC - 1 holds. Here are my calculated match equity tables showing MWC:


1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away5068.175.581.784.589.091.093.594.696.196.897.7
2-away
5059.566.173.679.283.486.789.491.793.494.8
3-away

5057.064.570.875.679.883.386.488.990.9
4-away


5057.463.969.273.777.981.584.587.0
5-away



5056.662.267.271.875.979.482.5
6-away




5055.961.166.070.474.477.8
7-away





5055.260.465.069.273.0
8-away






5055.259.964.368.3
9-away







5054.859.463.6
10-away








5054.658.9
11-away









5054.3
12-away










50

I left out the bottom half of the table for readability, but all values there are just 100 less the corresponding value in the top half.


And here are the corresponding post-Crawford MWCs (for player n-away, opponent 1-away):


1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away5048.631.930.618.717.711.310.66.76.34.03.7

These compare well to the "Mec26" MWC table here, mostly within 0.2%. Those are based on GNUbg rollouts for post-Crawford MWCs and using a gammon probability of 26.0% (vs 27.6% in mine). The biggest differences are in 1-away,2-away and 1-away,3-away where the MWC is different by about 0.5%.

Another modern MET is the G11 table. Again, not much difference between my calculated MET and this one, though bigger differences (especially near the diagonal) than with Mec26.



Wednesday, March 14, 2012

Full match equity tables

The next step after getting the Crawford-game match equities is to build out the full match equity tables for pre-Crawford games.

I followed the algorithm defined here, by Tom Keith, but there were a few assumptions I didn't fully understand. More on that later. Here is my calculated pre-Crawford match equity table, out to 12-away:



1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away00.3620.5100.6350.6900.7800.8200.8690.8920.9220.9360.953
2-away-0.36200.1890.3220.4720.5850.6670.7330.7880.8340.8670.896
3-away-0.510-0.18900.1390.2890.4150.5130.5950.6670.7290.7770.819
4-away-0.635-0.322-0.13900.1490.2790.3850.4740.5580.6300.6900.741
5-away-0.690-0.472-0.289-0.14900.1310.2450.3430.4360.5180.5880.649
6-away-0.780-0.585-0.415-0.279-0.13100.1180.2210.3200.4090.4880.557
7-away-0.820-0.667-0.513-0.385-0.245-0.11800.1050.2080.3000.3850.460
8-away-0.869-0.733-0.595-0.474-0.343-0.221-0.10500.1030.1980.2870.367
9-away-0.892-0.788-0.667-0.558-0.436-0.320-0.208-0.10300.0960.1880.271
10-away-0.922-0.834-0.729-0.630-0.518-0.409-0.300-0.198-0.09600.0920.178
11-away-0.936-0.867-0.777-0.690-0.588-0.488-0.385-0.287-0.188-0.09200.087
12-away-0.953-0.896-0.819-0.741-0.649-0.557-0.460-0.367-0.271-0.178-0.0870


And here are the corresponding post-Crawford match equities, recalculated using 2-ply Player 3.3 to get state probabilities in the first two rolls.


1-away2-away3-away4-away5-away6-away7-away8-away9-away10-away11-away12-away
1-away00.0280.3620.3880.6260.6450.7740.7880.8650.8740.9190.925

I calculated the average probability of win, gammon win, and backgammon win, and corrected all player-calculated probabilities so that the average probability of win was 50% and the average probability of gammon win/loss and backgammon win/loss were symmetric.

The probability of the game ending in gammon using these adjusted state game probabilities is 27.60%. The probability of the game ending in backgammon is 1.1%.

The approach:

  • Define M(n,m) as the match equity for player n-away and opponent m-away, for symmetric players before the first dice are thrown in a game (so both players have 50% chance of a win and the cube is at 1). By convention I'll assume n<m but M(m,n)=-M(n,m) so this isn't an issue.
  • Define Mp(n,m,c,X,Pw) as the match equity for player n-away, opponent m-away, cube value c, owned by player X (either P for player, O for opponent, or C for centered when c=1), and probability of any player win Pw. We'll assume this is piecewise linear in Pw in the range [take point,cash point] for centered cube, [take point,1] for opponent-owned cube, and [0,cash point] for player-owned cube.
  • The match equity table value M(n,m) = Mp(n,m,1,C,0.5), so if we can get Mp functions we're done.
  • We start with the cube at the maximum value, so that c is the smallest value that's greater than m (remember we assume m>n by convention). In this state the player owns the cube, since only the opponent can double to this level. No more doubling can happen, and any win or loss is a match win or loss, so Mp(n,m,cmax,P)=2 Pw - 1.
  • Then we step back to the cube 2x smaller than the maximum. For the case where the player owns the cube we calculate the cash point, where the opponent's equity in the maximum cube state equals their equity on a pass (which uses the match equity table for the corresponding away values which we bootstrap). We assume that the player does not double until the cash point, and so M(n,m,c,P) is linear from Pw=0 to Pw=cash point. We know the cash point value from the opponent's take/pass decision, and we know the match equity for a player loss at this cube value (which also uses the appropriate values from the match equity table for M(n,smaller m values)).
  • For the case when the opponent owns the cube, we assume the opponent doubles at the player's take point, and that the opponent's equity is piecewise linear in Pw from the take point to Pw=1 (where we know the opponent's winning value from the match equity table lookup).
  • We continue to step back to cube=1, where we assume that match equity is linear between the player's take point and cash point. This gives us Mp(n,m,1,C,Pw), which we use to calculate M(n,m) = Mp(n,m,1,C,0.5).
This follows Tom Keith's algorithm and returns the match equity table displayed above, using the probability of gammon given above.

The main assumption here is that the player and opponent double only when they are at the cash point or take point. This seems a bit unusual - really you'd double earlier than the cash point, and the opponent would double you earlier than your take point.

I tried an alternative approach, similar to the "live cube" model in Janowski: the probability of win diffuses like a Brownian motion with constant volatility sigma. Then I can treat the doubling problem as a derivatives pricing problem, using numerical backward induction to get the fair value. Given a cube value I know the match equity at Pw=0 and Pw=1, and also that the match equity (given a cube value) satisfies the partial differential equation sigma^2/2 d^2(match equity)/dPw^2 + d(match equity)/dt = 0. You solve this PDE separately for different "layers" corresponding to different cube values and cube ownership, and at each time step mix the values across layers based on doubling decisions.

This converged to exactly the values described by the Keith algorithm above. So the Keith algorithm is equivalent to a live cube model where probabilities diffuse continuously rather than jump. This makes me wonder whether a more accurate match equity table could mix between this limit and something like the dead cube limit, as with the money cube algorithm described by Janowski.

Player 3.3: TD then SL

Following the suggestion of Ã˜ystein Schønning-Johansen (in comments here), I tried training my player from random weights using TD learning, then once that had converged, switching to supervised learning on the GNUbg training databases.

The GNUbg team's experience is that this converges to a better player. Indeed, looking at Player 3.2's self-play, it does have some odd features: for example, roughly 7% of its games against itself result in a backgammon, much higher than the network odds predict from the starting board.

The new player that resulted is Player 3.3. Its benchmark scores: Contact ER 13.3, Crashed ER 12.0, and Race ER 0.93. In 40k cubeless money games it scores +0.595ppg against PubEval, +0.174ppg against Benchmark 2, and +0.028ppg against Player 3.2 in 100k cubeless money games. In terms of network setup it is the same as Player 3.2 (contact, crashed, and race networks; 120 hidden nodes; same extended inputs).

The first step was TD learning from random weights. I ran training for over 5M iterations trying to get solid convergence. I switched to use the GNUbg benchmarks as training benchmarks instead of gameplay against a reference player, since this is more accurate and quicker. Here are the learning charts for Contact, Crashed, and Race ER, starting at training run 100k to remove the initial rapid convergence, and using log scale on the x-axis:
























































The best benchmark scores were Contact ER 17.9, Crashed ER 21.5, and Race ER 1.79. Worse than Player 2.4 on Crashed ER but otherwise my strongest TD-trained player. In 100k cubeless money games against Player 3.2 it scored an average of -0.051ppg +/- 0.004ppg, and against Player 2.4 it scored +0.015ppg +/- 0.004ppg. In self-play the statistics of wins, gammons, and backgammons look plausible: 27% of the games end in gammon and 0.8% in backgammon.

The next step was to continue to train this player using supervised learning on the GNUbg training databases, starting with the TD-trained networks.

For a learning algorithm I used standard backpropogation on a randomly-sorted training database. I started with alpha=1 and kept alpha fixed over epochs as long as the Contact ER benchmark score did not increase for three consecutive epochs. If it did, I dropped alpha by a factor of sqrt(10). When alpha dropped below 0.1/sqrt(10) I popped it back up to 1 and continued. This follows what Joseph Heled described in his GNUbg training, though with alphas about 10x smaller (I tried using alpha=10 but it did not converge at all). The idea here is that if the training finds a local minimum and cannot converge, you use a larger alpha to kick it out of the local minimum and try again. Of course the best-performing networks are always saved so you can return to them. Once the player stopped improving I ran it for several hundred epochs at smaller alpha to help it to converge.

After this training the best benchmark scores were Contact ER 13.3, Crashed ER 12.0, and Race ER 0.93. So a significant improvement over the TD-trained player, which is not surprising, but also better than my best SL-trained player, Player 3.2 (which scored 14.0, 12.8, and 2.08).

In addition, the self-play statistics look more realistic. In 10k self-play games, 26.9% were gammons of any kind and 1.4% were backgammons.

It beats Player 3.2 by 0.028ppg, which is significantly more than any of the multivariate regressions would predict.

The player's benchmark performance is still significantly worse than GNUbg, except in Race ER where it is marginally better. Contact ER is much worse, which is the most important benchmark. It looks like there's still work to do on better inputs.

That said, I'm switching focus a bit to the cube handling side, so Player 3.3 will likely be my best cubeless player for a while.

Monday, March 12, 2012

The doubling cube and money games

Money games are not match games - in a money game you simply try to win as many points as possible. The doubling cube can be used in money games, doubling up to a maximum value of 64.

In some ways doubling strategy for matches is simpler than for money games because there is a well defined end point. But the standard methodology for cubeful money play ends up being much simpler than match play.

The standard approach was defined in a paper by Rick Janowski, where he proposed an approximation where you linearly interpolate cubeful equity between the "dead" and "live" cube limits.

The dead cube limit is one where you assume that after accepting a double, neither player can double again. The equity in this limit is straightforward.

The live cube limit is one where you assume an infinite number of possible redoubles, but (crucially) gameplay where the probability of win diffuses continuously, instead of jumping from step to step. The equity in this limit is slightly more complex but still has a nice general formulation.

The approximation for real cubeful play is that the cubeful equity is a linear interpolation between these two limits, using an "cube life index" as the interpolation variable that runs from 0 (the dead cube limit) to 1 (the live cube limit).

The cube life index is an input to the model and can be different for each player. Typical values are 0.6-0.7.

GNUbg uses this approximation itself, with a state-dependent cube life index. The backgammon iPhone app Backgammon NJ also uses the Janowski approximation.

Under this approximation, and given a cube life index for each player, there are simple linear forms for equity in the different cube states for each player, which can then be used to determine when to double (when your equity from doubling and giving the opponent the cube is greater than the equity from not doubling) and when to take vs pass (take if the equity from doubling and receiving the cube is greater than the amount you lose from passing).

The inputs to the model are the cube life index, plus the cubeless equities (eg from a neural network).

An alternative approach in the end game is to use two-sided bearoff databases - three of them. One corresponds to equity for a centered cube, one for equity assuming the player owns the cube, and one for equity assuming the opponent owns the cube. This gives exact cubeful equities, since the Janowski approximation can break down (in the sense that a fixed cube life index is not appropriate) near the end of the game.


Saturday, March 10, 2012

Match equity table: Crawford games

Crawford games happen in matches the first time a player gets within one game of winning the match. In the Crawford game the opponent cannot use the doubling die.

To compute match equity for these games, we start with the post-Crawford match equities and backward induct.

The Crawford game match equity table (for player 1-away and opponent n-away) I calculated is:



1-away
2-away
3-away
4-away
5-away
6-away
7-away
8-away
9-away
10-away
11-away
12-away
1-away
0
0.363
0.511
0.636
0.692
0.781
0.821
0.870
0.893
0.923
0.937
0.954

Here's how I calculated it:

Call Mc(n) the Crawford game match equity. By symmetry, Mc(1) must be 0. 

The doubling cube cannot be used in the Crawford game, so the more general match equity calculation is fairly straightforward. First, you decide the match equity conditional on different game end states:

Any player win: match equity = +1 (probability 50%)
Single player loss: match equity = Mp(n-2) (probability = 50% - gammon probability)
Single player gammon: match equity = Mp(n-4) (probability = gammon probability)

where Mp(n) = post-Crawford match equity for player 1-away and opponent n-away.

For terminological convenience, Mp(n) = -1 for n<=0. We ignore backgammons for this calculation.

Then the weighted Crawford match equity must be

Mc(n) = 1/2 + ( 1/2 - Pg ) Mp(n-2) + Pg Mp(n-4)

where Pg = probability of a gammon loss in a game as seen from before the first die throw, so symmetric (that is, the same as a the probability of a gammon win). Using 2-ply Player 3.2 I estimated this before as 13.735%.

One difference to note in the table above vs the table here (an example of a modern match equity table): for 1-away 3-away the match equity is greater than 0.5. That's because mine includes the proper post-Crawford match equity for 1-away 2-away, rather than approximating that as zero. 

Other differences for the Crawford game match equities are smaller than that equity error of 0.011; mostly around 0.008 or less. So ignoring the non-zero value of Mp(2) adds a relatively significant error. 

The biggest source of error when computing the Crawford match equities is on 1-away, 2-away (that is, Mc(2)), and is choosing an accurate gammon probability. This reference chooses a 20% chance that a cubeless game ends in a gammon, which is too low and gives too high a Mc(2) by as much as 0.04 points.