I trained the new player which has two networks (race & contact) and uses the one-sided bearoff databases for up to nine points and fifteen checkers. I'll call it Player 2.
It trained for 780k runs, using alpha=0.1 for the first 200k and alpha=0.02 for subsequent runs. Plotting its performance against the old benchmark (Benchmark 1 wasn't ready) it converged after about 300k training runs.
I then played it against Benchmark 1 (single network with prob of win & gammon outputs). It did not perform well: in a 10,000 game match (cubeless, of course) it won 51.5% of the games and won an average of only 0.005ppg. So effectively the same as Benchmark 1.
This is pretty surprising. Player 2 has outputs for backgammon win/loss, so should perform better in edge case games where Benchmark 1 just doesn't bother taking its checkers out of the opponent's home board, so loses a backgammon instead of a gammon. It uses a bearoff database for end game races, so should perform better there. And it splits race vs contact phases into separate networks, so should perform better in each.
I'll look at some specific examples of moves in various game phases to understand the performance a bit better.
It trained for 780k runs, using alpha=0.1 for the first 200k and alpha=0.02 for subsequent runs. Plotting its performance against the old benchmark (Benchmark 1 wasn't ready) it converged after about 300k training runs.
I then played it against Benchmark 1 (single network with prob of win & gammon outputs). It did not perform well: in a 10,000 game match (cubeless, of course) it won 51.5% of the games and won an average of only 0.005ppg. So effectively the same as Benchmark 1.
This is pretty surprising. Player 2 has outputs for backgammon win/loss, so should perform better in edge case games where Benchmark 1 just doesn't bother taking its checkers out of the opponent's home board, so loses a backgammon instead of a gammon. It uses a bearoff database for end game races, so should perform better there. And it splits race vs contact phases into separate networks, so should perform better in each.
I'll look at some specific examples of moves in various game phases to understand the performance a bit better.
No comments:
Post a Comment