Sunday, January 15, 2012

Player 2: corrected

I had a bug in the board evaluation function for Player 2, where it would often return an equity of 1 in cases where it was almost certain to win a gammon or backgammon. So that was doing two things: in playing against the Benchmark 1 opponent it was occasionally making the wrong choice of optimal board; and in training it was training the networks to do the wrong thing.

I re-trained the network for 235k training runs, starting with the network weights I had from the earlier (buggy) training rather than starting from scratch.

Before re-training, Player 2 wins +0.112ppg against Benchmark 1, and wins 52.0% of the games (in a 10,000 game match).

After re-training, Player 2 wins +0.134ppg against Benchmark 1, and wins 52.0% of the games (also 10k games).

So it turns out the bug didn't affect the training that much; it's more that it meant Player 2 didn't properly take advantage of Benchmark 1 when it could not distinguish between gammons and backgammons. That is, Player 2 on its own, playing against itself, records about 0.8% of its games as backgammons. When playing against Benchmark 1 that jumped to 10.3%, so the majority of the equity gain against Benchmark 1 comes from those backgammons: if they'd been gammons instead, the equity advantage would have dropped 0.095.

Even aside from the gammon vs backgammon advantage, it does play better than Benchmark 1 by a small amount. It may be unfair to allocate all that 0.095 equity difference to Benchmark 1's inability to distinguish between gammons and backgammons. Really I should make a new Tesauro-style benchmark that has backgammon nodes to make sure.

But I'd still expect better performance due to the race/contact network split and benchmark database for end-game races.

For reference: against pubeval it scores +0.435ppg and wins 64.6% of its games, in a 10k-game match. (Note: pubeval results were updated after the pubeval player was fixed.)

No comments:

Post a Comment