Monday, January 30, 2012

Player 2.4: Berliner primes inputs

I implemented the Berliner primes inputs "AContain" and "Contain" and extended Player 2.1 to include them. I'm calling the resulting player Player 2.4.

The summary: it's unclear whether adding the primes inputs adds any performance improvement, but the extra training did have a little benefit, and the extra inputs may improve things a little. Player 2.4 is my strongest player so far, by a small margin, against my available benchmarks.

Training started with the Player 2.1 weights and random (uniform between -0.1 and +0.1) weights for the four new inputs (two for each player). I started with alpha=0.02, dropped to alpha=0.004 at 100k iterations, then to alpha=0.0008 at 700k iterations. That continued to 1.1M iterations.

Its full definition:

  • Two neural nets: contact and race.
  • 80 hidden nodes for both networks.
  • Contact inputs are the original Tesauro inputs, less inputs for whose turn it is, plus: input that notes whether a backgammon is impossible (one for each player), as per Player 2; input that notes number of hitting shots (one for each player), as per Player 2.1; and the new inputs that track primes as per Berliner (two for each player). Race inputs are just the original Tesauro inputs.
  • One-sided bearoff database for up to fifteen checkers spread out to up to six points, so the full home board.

I won't post the training chart since the 1,000-game match results are too noisy to be meaningful. The standard error on 1,000 games (cubeless money equity) is 0.045ppg, which is much larger than the equity edge coming from the extra input. I'm going to have to change my training benchmarking to use a larger number of games in the benchmark matches (and probably do them less frequently, since 100k games is fairly expensive).

After the training, in a 100k-game match (cubeless money games), it scores +0.009ppg against Player 2.1, with a standard error of 0.005ppg, so barely a statistically significant improvement.

Even though this is a pretty weak improvement I'll stick with the Berliner formulation since it seems to be the standard (and gives comparable performance to my other tries), and put the primes inputs to bed. However, it's unclear to me whether these new inputs are necessary at all. Maybe they'd be more important with a smaller number of hidden nodes.

I tried two other experiments related to these new inputs.

The first was to see whether I should start training from scratch, or from the Player 2.1 set with random weights just for the new inputs (as described above). Starting from scratch I ran 1M training runs; in a 100k-game match the trained player lost 0.01ppg against Player 2.1. So it needed more training just to get back to the Player 2.1 starting point: it's definitely better to start training from a previously-trained player.

The second was more interesting: to see whether the equity edge I saw above came really from adding the new input, or just from training the Player 2.1 network for more training runs and/or with smaller alpha. So I took Player 2.1 without the new inputs and trained it for an additional 420k training runs (100k with alpha=0.02, the rest with alpha=0.004).

It scored +0.007ppg +/- 0.005ppg against Player 2.1. So: much of the small performance improvement noted above came from extra training on the other weights, not from adding the new inputs. That said, the magnitudes of the impacts are order of the standard error and so it is difficult to state much with confidence. Again, it makes it hard to see any real benefit from adding the new primes inputs.

Some other benchmarks for Player 2.4, again in 100k cubeless money game matches:

  • Player 2.1: scores +0.009ppg +/- 0.005ppg, 50.1% +/- 0.1% win probability
  • Player 2: score +0.043ppg +/- 0.005ppg, 51.6% +/- 0.1% win probability
  • Benchmark 2: scores +0.071ppg +/- 0.005ppg, 52.7% +/- 0.1% win probability
  • PubEval: scores +0.475ppg +/- 0.005ppg, 66.4% +/- 0.1% win probability
Note all performance numbers here relate to the 0-ply version of the players, and +/- numbers are standard errors on the score and probability of win estimates. These aren't entirely apples to apples with earlier equity estimates because those were often on 30k games instead of 100k games, but should be close.




No comments:

Post a Comment