Friday, March 30, 2012

Optimal Janowski cube life index revisited

After fixing my implementation of Janowski's doubling strategy I tried again to find the optimal cube life index (assuming we just use a single constant index).

As before, I found the optimal cube life index is 0.7.

To test this I ran doubling strategies with different cube life indexes against one with a fixed index of 0.7 to see if any other index would outperform. For indexes near 0.7 I ran 1M self-play games, using Player 3.3 for checker play. For indexes a bit further out I used 100k self-play games.

Here are the results:


Two interesting points to note. First is that the probability of win continues to increase to x=0.95 even though the average score decreases, due to the asymmetric market windows of the two players. Second is that there is a fairly dramatic falloff in performance (both in score and probability of win) for x>0.95.

Wednesday, March 28, 2012

Janowski model revisited

I was a bit incorrect in how I interpreted Janowski's paper.

The real way to think about his model is as a linear interpolation between the dead cube equity and the live cube equity.

The dead cube equity is always P*(W+L)-L, normalized to the current cube value. The live cube equity is piecewise linear, and has different forms for three cases:

1) Cube centered. 0<P<TP, equity runs from -L to -1; TP<P<CP, equity runs from -1 to 1; CP<P<1, equity runs from 1 to +W.
2) Cube owned by player. 0<P<CP, equity runs from -L to +1; CP<P<1, equity runs from +1 to +W.
3) Cube owned by opponent. 0<P<TP, equity runs from -L to -1; TP<P<1, equity runs from -1 to +W.

TP and CP here are the live cube take and cash points, so

TP = (L-1/2)/(W+L+1/2)
CP = (L+1)/(W+L+1/2)

Then the Janowski model equity is x * live cube equity + (1-x) * dead cube equity.

That's it. Everything else falls out of that, and in practice you don't really need to calculate initial double, redouble, too good points, or even take and cash points - you just compare cubeful equities.

For example, the take decision calculates the equity when the player owns the doubled cube and compares that to -(cube value). If the equity on take is greater than the equity on pass, you take.

Similarly, the double decision calculates the equity on double (which is the minimum of the equity when the opponent owns the doubled cube and +(cube value) for the pass) and compares it to the equity at the current cube value when the player owns the cube. If the doubled equity is bigger, you double.

My error before was around too-good points: in my implementation I assumed that the too-good point was calculated by comparing the dead cube equity with pass value, thinking that you're giving up the value of the cube by not doubling. But really you're not; you still have the option of doubling if the game starts to go against you. You should be comparing the model cubeful equity at the current cube against the value you'd get on a double (the minimum of the take and pass equities there).

This invalidates some of the statistical results I calculated for Janowski's model before, so I'll have to redo those.

Also it makes it easy to generalize to two cube life indexes: each player's interpolation of equity uses their appropriate index.

Jump model performance

I implemented a doubling strategy following my jump model and ran it against the optimal Janowski doubling strategy. All checker play was Player 3.3.

This first implementation assumes a constant jump volatility, so does not make any attempt to calculate a local jump volatility in the game state. It's meant to be a benchmark to compare against the Janowski strategy with a single cube life index, which should be pretty similar in performance.

I ran one million cubeless money games of the jump model with different jump volatilities, vs the Janowski strategy with the optimal single cube life index of 0.7. The goal was to see which jump volatility value would give the best performance.

In the paper I made an estimate of a constant jump volatility of 7.7% based on a calculation of the average absolute probability change from turn to turn when the cubeless win probability was near the take and cash points. That was only an approximate calculation, based on crude windows of win probability, but should be roughly what comes out of this more careful analysis.

Here are the results of the games:


The blue line/points represent the self-play results for average points per game for the different jump volatility inputs, running from 5% to 11%. Each has an error bar of +/- 0.004ppg which represents the standard error on the averages. The black line shows a parabolic fit through the points. The red line/points represent the probability of win for the jump model strategy.

The jump volatility with the best ppg performance was 8.0%, quite close to the naive estimate from cubeless play statistics of 7.7%. But a jump volatility anywhere in the range 7.0% to 9.0% performed quite comparably. I fit a parabola to the ppg data and that suggests an optimal jump volatility of 8.5%, which is probably a more realistic number given the statistical uncertainty on individual points.

The jump model with constant jump volatility performed marginally better than the Janowski model with a single cube life index, though the margins were small: a few hundredths of a point per game, in games with an average cube value of around 2.3. The differences are statistically significant (the standard error for all ppg numbers is +/- 0.004ppg) but only because I ran 1M games; really the performance is practically indistinguishable.

That is roughly what one would expect given that the model with a constant jump volatility and constant conditional win and loss amounts W and L can be mapped directly onto Janowski's model with a single cube life index. Of course W and L are not actually constant, so the models are not practically the same, but they should be close.

That said, the probability of win showed a much larger effect than points per game. A jump model strategy with a jump volatility of 5% won 50.2% of its games (while losing 0.006ppg in score); a strategy with jump volatility of 11% won only 44.2% of its games (but won 0.011ppg in score).

High jump volatility means the player chooses to double at lower probability of win and pass at higher probability of win. So the games it loses because of passing a double are offset by games it wins when doubled.



Monday, March 26, 2012

New cube decision model posted on arxiv.org

I posted the next draft of my new model for calculating cube decision points to the preprint site arxiv.org, so now it has a proper stable link:

Cube Handling In Backgammon Money Games Under a Jump Model

Abstract

A variation on Janowski’s cubeful equity model is proposed for cube handling in backgammon money games. Instead of approximating the cubeful take point as an interpolation between the dead and live cube limits, a new model is developed where the cubeless probability of win evolves through a series of random jumps instead of continuous diffusion. Each jump is drawn from a distribution with zero mean and an expected absolute jump size called the “jump volatility” that can be a function of game state but is assumed to be small compared to the market window.

Simple closed form approximations for the market window and doubling points are developed as a function of local and average jump volatility. The local jump volatility can be calculated for specific game states, leading to crisper doubling decisions.

Checkpoint: recent progress and next steps

Since my last checkpoint around 2m ago I've made quite a lot of progress.

Main highlights:
  • Networks: I added a crashed network following the GNUbg definition of "crashed".
  • Network inputs: I added a new input to the contact and crashed networks that measures strength of prime blocking. I also extended the race inputs to more granularly track the checkers born in.
  • Training: I grabbed the GNUbg training databases for contact, crashed, and race networks, and ran supervised learning against them. The best performance is first trained using TD learning, then uses those network weights as the starting point for supervised learning.
  • Benchmarking: I moved from bot self-play performance to benchmarking against the GNUbg benchmark databases for cubeless play, since that gives a more accurate and robust set of benchmarks for different game phases. I ran some stats to look at the relationship between self-play results and GNU benchmark scores.
  • Cube handling: this was a new area for me since my personal game is weak on cube play. I learned about Janowski's classic model for money games, as well as how to construct match equity tables for tournament play.
  • New model for money game cube action: I developed a new model for cube decisions in money games, moving from the "live cube" limit where probability of win is assumed to diffuse, to a jump model where the probability of win jumps each turn.
Along the way I developed several new players incorporating the new ideas. As of now my best checker player is Player 3.3, which scores GNUbg benchmark scores of Contact 13.3, Crashed 12.0, and Race 0.93 (compared to GNUbg 0-ply scores 10.5, 11.0, and 1.01).

Stuff still to do:
  • Money play: implement my new model properly and test out its performance. Hopefully it should perform better than the standard approaches.
  • Match play: implement matches properly in my backgammon framework and see if I can extend my new model approach to match play as well (coming up with better match equity tables along the way).
  • Checker play: my players' Contact benchmark scores are still pretty weak compared to GNUbg. I suspect I need to add more custom inputs. I also want to investigate some more advanced supervised learning techniques applied to training against the GNUbg training databases.
  • Rollouts: rollouts are still pretty basic in my framework; I need to properly implement variance reduction and cutting off rollouts before the end of the game.
  • FIBS hookup: I still want to hook up my bot to FIBS and see how it performs there in real-world games.
  • Networks: I've been talking with Øystein Johansen about maybe breaking up the "contact" specification into separate nets for different game phases, perhaps using k-mean clustering to define the phases.
Fun stuff!

Saturday, March 24, 2012

New model paper v2

Here's the second draft of the paper, revamped considerably:

The main changes:
  • Added local and average jump volatility. Take and cash points depend only on the average volatility since on a pass/take decision you're comparing post-double equity with -1. The post-double equity depends on the jump volatility at the cash point; but should the game turn and head back up there it'll probably be in quite a different state than it's in near the take point. So using the average jump volatility seems more reasonable. Local jump volatility impacts the cubeful equity at the current cube level, which in turn impacts doubling decisions.
  • I note that if I assume a constant (single) jump volatility, it maps very nicely onto Janowski's model with one x. But the local + average jump volatility version does not map as nicely onto his model extension to a cube life index for each player, x1 and x2.
  • I changed the definition of jump volatility to be the expected absolute jump size instead of jump standard deviation. That's what the equity calcs care about. In fact, I only need to assume the jump distribution is symmetric for positive and negative jumps; if that is true then I don't need to know anything more about the jump distribution than its expected absolute jump size. Pretty nice! I got caught up for a while playing around with different distributional assumptions before I cottoned onto that.
  • I rewrote Appendix 3, the one that works through the details of the jump model, to try to make it more clear when you should use local vs average volatility. That I suspect is the subtlest part of this. I think I understand this properly now, but I went through a couple of iterations myself before I got here, so it could be that I'm still getting something wrong.
This is on the back of a very rewarding conversation with Rick Janowski, who pointed out how I needed to rethink this in terms of something more like his model where he has a different cube life index for both players.


Wednesday, March 21, 2012

Optimal Janowski cube life index

I tried to determine the optimal constant Janowski cube life index by playing players with different cube life indexes against each other.

I started broadly by playing different cube life indexes against a fixed 0.7, since 0.7 is near where other people have quoted the cube life index.

Statisticx=0x=0.25x=0.50x=0.75x=1
Average points per game-0.30-0.14-0.040.00-0.09
Probability of win37.541.845.951.258.3
Average cube2.322.402.271.961.45

Interesting that the probability of win was well above 50% but on average lost points for x=1. It must be because when it wins it wins small, but when it loses it loses big. Presumably that's because the player gets doubled and takes when the game goes against it, but only doubles past the cash point when the game goes in the player's favor.

Next I tried to zero in on the winning range by running the same thing in a tighter range:

Statistic
x=0.55
x=0.60
x=0.65
x=0.70
x=0.75
Average points per game
-0.03
-0.01
0.00
0.00
0.00
Probability of win
46.9
47.8
48.9
50.0
51.2
Average cube
2.23
2.17
2.11
2.04
1.96

So it seems like 0.7 is indeed the optimal cube life index in terms of expected points per game, at least within +/- 0.05.