Friday, April 27, 2012

New inputs failure: bar hit/entry probability

I've been spending a little time looking at cases where my Player 3.4 does poorly in the GNUbg contact benchmarks database, to get some feel for what new inputs I might try.

It looks it's leaving blots too often when the opponent has a good prime blocking the way out of the home board.

So I tried two new inputs: the odds of entering the opponent's home board if there were a checker on the bar; and the odds of hitting an opponent blot in his home board if there were a checker on the bar.

I tried two training approaches: first, adding random weights for just those four new weights (the two inputs times two players) and doing supervised learning on the GNUbg training databases; and also starting from scratch, random weights everywhere, and doing TD training through self-play and then SL on the GNUbg training databases.

The conclusion: neither worked. In both cases the new player was about the same as or a little worse than Player 3.4. So these aren't the right new inputs to add.

Back to the drawing board.

Wednesday, April 25, 2012

Jump model final version posted

I've posted a new version of my jump model for cube decision points:

http://arxiv.org/abs/1203.5692

This version is quite similar to the last one, with just a few elaborations added after another set of very productive discussions with Rick Janowski. He's been a huge help in pulling this together.

I doubt this paper will change again, though I'll probably revisit the model in the future with another paper. Probably to focus on how to estimate the local jump volatility accurately.

Tuesday, April 17, 2012

PubEval trained using ListNet

I spent some time looking at PubEval again - not my implementation, which is fine now, but rather how Tesauro came up with it in the first place. One discussion suggests that he trained it using "comparison training", a machine learning approach he seems to have come up with - some kind of supervised learning on a manually-prepared set of benchmarks. Each benchmark (I'm assuming) was a list of moves given a starting point and a dice roll, where the moves were ordered by goodness.

I tried to reproduce this. I couldn't find any proper references to "comparison training", but there's a lot of relatively recent literature on machine learning algorithms for generating rankings, which is the same sort of thing.

We can do a lot better than Tesauro's hand crafted training set: we have the GNUbg benchmark databases that are much larger and more accurate.

So what we want is an algorithm that we can feed a training set, where each element of the set has the five boards listed for each move and the rolled-out equities for each. The inputs that define the board are the PubEval inputs, and the evaluation function should be a linear function of the inputs (like PubEval is).

Wikipedia has a nice summary of different machine learning approaches to ranking estimators.

The ListNet algorithm seems like a good choice. I implemented it and trained it on the GNUbg contact and race benchmark databases.

It fairly quickly converged to a better solution than Tesauro's original PubEval. That is, the weights I found can be plugged into the usual PubEval setup, but give a slightly stronger player. Not much better, but noticeably stronger. Not surprising given the more comprehensive training set.

The weights themselves, and the output values, were quite different to PubEval. The ListNet algorithm effectively trained the regression to approximate equity, so in this approach the board evaluations correspond to equity (though optimized for equity differences on similar boards rather than outright equity).

The GNUbg benchmark scores were: Contact 41.5, Crashed 47.5, and Race 1.90. This compares to PubEval's scores of 44.1, 49.7, and 3.54.

The weights are available on my Dropbox account: race and contact.

In 100k cubeless games (with variance reduction) against PubEval it scores +0.043ppg +/- 0.004ppg. Again, noticeably better.

Of course this is a terrible player compared to neural network players, but it's neat to be able to reproduce something like what Tesauro did with PubEval. And it was a good excuse to play around with the newer machine learning algorithms focused on ranking.

As well this might be an interesting training approach for a neural network. The network would be optimized for checker play, so would be less efficient at absolute equity estimation required for doubling decisions. But perhaps one could have two sets of networks, one optimized for checker play, the other for doubling decisions.

Tuesday, April 10, 2012

Average number of games in a match

Some more interesting statistics on match play: the average number of games per match.

I ran self-play for two players using Player 3.4 for checker play (cubeful equity optimization) and Janowski match model with cube life index 0.70 and looked at how many games it took on average to finish a match.

The most interesting result is that the average number of games divided by the match length seems to converge reasonably well to a value around 0.65.

Here is a chart of the results: blue line/left axis shows average number of games, red line/right axis shows the average number of games divided by the match length:


The only other source I could find for similar statistics gave similar results (out to match length 11).

I ran the experiment with x=0.5 and x=0.9 as well to see how that affected the converged average number of games per match length. x=0.5 gave a converged ratio of 0.51; x=0.9 gave 0.90. This compares to 0.65 for x=0.7.

So the average number of games per match divided by the match length converges (for long matches) to something surprisingly close to the cube life index itself! I'm sure this is just a coincidence, but an interesting one.

Optimal Janowski cube life index in match play

In my last post I looked at extending Janowski's cubeful equity model to match play.

The conclusion: match play also favors a cube life index very close to 0.70.

I played a Janowski match strategy with different cube life indexes against a Janowski match strategy with cube life index 0.70 as a benchmark. I ran 40k matches with variance reduction and recorded the average points per match.

The results:

Match
Length
x=0.5x=0.6x=0.8x=0.9
3-0.0040.000-0.004-0.007
5-0.007+0.007+0.002-0.024
7-0.021-0.003-0.008-0.029
9-0.028-0.006+0.003-0.037

If I fit parabolas through the results and force them to pass through zero ppm at x=0.7, I find optimal cube life indexes of x=0.61 for match length 3, x=0.64 for match length 5, x=0.68 for match length 7, and x=0.69 for match length 9.

All average points per match have a standard error of +/- 0.004ppm, so the statistics are marginal for the shorter match lengths.

There is some evidence for a smaller cube life index for shorter matches, but not much. In general the optimal match cube life index looks very close to the optimal money cube life index.

UPDATE: I ran longer simulations for more values of the cube life index for match lengths 3, 5, and 7 to try to get more accurate statistics. From those data I get optimal cube life indexes of 0.70, 0.67, and 0.69 for match lengths 3, 5, and 7 respectively. So no evidence of a smaller optimal cube life index for shorter matches: everything should use 0.70.

That said, the performance difference for short matches of using a suboptimal cube life index is pretty infinitesimal. It becomes a bigger deal for longer matches.

Monday, April 9, 2012

Janowski model extended to match play

When I was looking at match equity tables I wondered whether you could extend Janowski's model (using a "cube life index" to interpolate between dead and live cube equities as a proxy for the jumpiness of game state) to match play. I'm pretty sure this is what GNUbg does based on their docs.

Turns out it's pretty straightforward if you assume the same match equity table as you calculate with Tom Keith's algorithm, which is a live cube model - it assumes game-winning probability changes diffusively, and that W and L (the conditional points won on win or lost on loss) are independent of the game-winning probability. That's the same as Janowski's live cube limit, except W and L are calculated from entries in the match equity table instead of the usual money scores for wins, gammons, and backgammons.

The Keith algorithm mentioned before gives you the cubeful match equity in the live cube limit. The dead cube limit has cubeful equity that's linear in P, running from -L at P=0 to +W at P=1. The model cubeful equity is just the weighted sum of the live and dead cube cubeful equities.

I implemented this to see how it compares to using a Janowski money model for doubling decisions in tournament play. That's of course inefficient - it doesn't account for match equity - but it's an interesting benchmark to show how much a match-aware doubling strategy matters.

Checker play was Player 3.4 optimizing on cubeful equity, and I ran 40k matches (with variance reduction) for a range of match lengths and cube life indexes. For a given cube life index, both the match and money doubling strategies share the same cube life index, which seems a fair comparison. Remember, for money play, a cube life index of x=0.70 was optimal.

The entries in the table are average points per match in the 40k matches. All values have a standard error of +/- 0.005ppm.

Match
Length
x=0x=0.25x=0.50x=0.75x=1
3+0.056+0.047+0.051+0.066+0.054
5+0.073+0.080+0.091+0.105+0.067
7+0.115+0.138+0.137+0.142+0.061
9+0.115+0.148+0.150+0.154+0.081

The main conclusion here is that using a proper match-aware doubling strategy makes a huge difference in outcome. The impact is larger for longer matches because any per-game edge gets magnified over a match. In principle for very long matches the match strategy should become the money strategy, but for matches up to length 9, anyways, that is not apparent in the statistics.

This assumes the same match equity table as before, which is based on the live cube model. Really I should recalculate the match equity table assuming the same Janowski model for cube decisions, which I think will change it a bit. But I'll leave that for another day.

Or I could extend my jump model to match play - it should be a relatively simple extension, since like with Janowski it's just about changing W and L to be based off match equities.

Saturday, April 7, 2012

Cubeful vs cubeless equities in checker play decisions

So far all my players have made checker play decisions by choosing the position with the best cubeless equity, even after I introduced doubling strategies to decide when to offer & take doubles. This is clearly suboptimal: the doubling strategies can calculate cubeful equity, so the checker play should choose moves based on the position with the best cubeful equity, not best cubeless equity.

I'm referring to money games; for match play they should optimize match equity.

I extended my backgammon framework to allow the players to optimize on cubeful equity in checker play to see how much that would improve things.

The answer, it seems: at least for my intermediate-strength players, not much!

I played 100k cubeful money games (with variance reduction) between two players (both Player 3.4) that both use Janowski x=0.7 for their doubling strategy. The first used cubeful equities for checker play and the second used cubeless equity.

The first player scored -0.0004ppg +/- 0.0044ppg. Hardly a significant advantage to using cubeful equity when choosing the best move - the score is indistinguishable from zero.

I also thought I'd run the cubeful-equity player through the GNUbg benchmarks. The benchmarks are for cubeless play, so that should give something worse than for the cubeless-equity player. But the benchmarks are all about choosing the best move out of the possible moves, so all that matters there is relative equity difference, not absolute equity, and perhaps that's not affected much by cubeful vs cubeless.

The cubeless benchmark scores are Contact 13.32, Crashed 11.69, and Race 0.766.

Using a cubeful-equity player, always assuming a centered cube, the scores were 13.44, 12.16, and 0.766. So the Contact and Crashed scores got a little bit worse, but hardly changed. Using cubeful equity in checker play decisions does not seem to make a big difference in almost every case.

Assuming player-owned cube the scores were 13.41, 12.03, and 0.768. Assuming opponent-owned cube the scores were 13.42, 11.82, and 0.764. So cube position does not matter much either.

The conclusion: it doesn't matter much if you use cubeful or cubeless equity in checker play decisions.

Nonetheless the cubeful equity performance is marginally better, and there are probably edge case conditions where it matters more, so I'll start using cubeful equity in money play.

Friday, April 6, 2012

Player 3.4: incrementally better

I let the supervised learning algorithm train for a while longer, starting with the Player 3.3 networks (i.e. the same networks and inputs).

Its benchmark scores were Contact 13.3, Crashed 11.7, and Race 0.766, so my best player yet, but only incrementally better than Player 3.3's scores of 13.3, 12.0, and 0.93.

On the most important benchmark, Contact, it was unchanged. And in self-play against Player 3.3 (100k games with variance reduction) its score was zero within a standard error of 0.0009ppg. Not exactly a startling improvement!

Nonetheless it is measurably better in the other GNUbg benchmarks so I'll start using it as my best player.

Playing 100k cubeless money games against PubEval it scored +0.586ppg +/- 0.001ppg, winning 69.22% of the games; Player 3.3 in the same games scores +0.585ppg +/- 0.001ppg and wins 69.20% of the games. So again very little difference, but still incrementally my best player.


Self-play variance reduction

I've done a lot of bot self-play to gather statistics on relative strength and so on.

One obvious way to reduce variance in the results is by running half the results with player 1 vs player 2, then half the results with player 2 vs player 1, but using the same random seeds as with the first half. That is, in both halves the dice are the same, but the players alternate taking them. That significantly reduces the impact of dice luck. In the limit of both players being exactly equal in terms of strategy this will return exactly 0 points per game and 50% wins, so perfect variance reduction.

Note that only half as many independent trials are used; that tends to make the standard error worse. But the variance reduction should more than compensate.

I haven't done this before, which is a little embarrassing, honestly.

But, now that I've thought of it, I figured I'd do a little statistical analysis to investigate how effective the variance reduction actually is.

The first example: playing 10k self-play money games where both players use Player 3.3 for checker play, the first player uses a Janowski doubling strategy with x=0.7, and the second player uses Janowski with x=0.6. In my previous experiments where I ran 1M money games with no variance reduction I got an average score of +0.018ppg with a standard error of 0.003ppg.

No variance reduction: +0.017ppg +/- 0.032ppg
Variance reduction: +0.026 +/- 0.014ppg

Both are consistent with the more accurate estimate, but the standard error with variance reduction was 2.3x smaller. That corresponds to the standard error when running 2.3^2=5.3x as many simulated games with no variance reduction, so saves over a factor of 5 in run time.

The second example: playing 10k self-play cubeless money games, Player 3.3 against PubEval.

No variance reduction: +0.610ppg +/- 0.014ppg
Variance reduction: +0.583 +/- 0.012ppg

In this case the variance reduction doesn't help much. That's likely because the two strategies are sufficiently different that they make different moves, and the same dice throws applied to different positions imply different luck.

So when two players are comparable the variance reduction is quite powerful. When they imply significantly different checker play the technique is less powerful, but still worth doing.

Thursday, April 5, 2012

Jump model revisited

After conversations with Rick Janowski I realized that I'd approached the jump model incorrectly. I've since revamped the whole thing and reposted the corrected paper on arxiv.org - the same link as before, overwriting the old (incorrect) version.

I feel pretty confident about this second cut. The implied Janowski cube life index is right around the optimal value I found, and jump volatility found by optimizing the model parameter in bot self-play ties out well with statistical estimates of jump volatility.

I put the python code I used to generate cubeful equities onto github, released under the GPL. It contains functions for calculating cubeful equity in the three cube states (player-owned cube, unavailable cube, and centered cube). Three methods for calculating the equity: numerically, solving the exact model; a linear approximation; and a nonlinear approximation that improves on the linear one. They are all developed in the paper.

Cube Handling In Backgammon Money Games Under a Jump Model


A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Each jump is drawn from a distribution with zero mean and an expected absolute jump size called the “jump volatility” that can be a function of game state but is assumed to be small compared to the market window.

Closed form approximations for cubeful equities and cube decision points are developed as a function of local and average jump volatility. The local jump volatility can be calculated for specific game states, leading to crisper doubling decisions.