The second benchmark player, Benchmark 2, is similar to Benchmark 1 except that it adds two extra output nodes: the probability of backgammon win and the probability of backgammon loss.

In addition it does some (fairly obvious) validation on the probabilities that Benchmark 1 did not do: it checks that the estimate of gammon probability is less than or equal to the win probability, that the backgammon probability is less than or equal to the gammon probability, and it properly overrides the network calculations for gammon and backgammon values in the cases where it knows the results (for example, zero probability of gammon loss if the player has already borne off a checker).

Here is a chart of training performance vs Benchmark 1 over 450k training runs. I started with the Benchmark 1 network weights and added small random backgammon->middle weights (uniform random between -0.1 and +0.1). I used alpha=0.1 for the first 200k runs and then dropped to alpha=0.02.

The blue dots are the results of 1,000-game matches against Benchmark 1, and the blue line is the 10-match moving average. It looks reasonably converged after 450k iterations.

It is somewhat surprising that it took so long to converge: I haven't checked, but based on the Benchmark 1 and Player 2 training from scratch, I would have expected training from scratch to converge in around 300k iterations. And starting from a more sensible point (the Benchmark 1 weights) should have made it better. That said, training the backgammon weights is the slowest bit of training since most games do not end in backgammon.

Benchmark 2 also uses 80 hidden nodes and the standard Tesauro inputs (less the two inputs noting whose turn it is, as usual). I added two custom inputs: one which is zero if the player can still be backgammon and one if the player cannot be backgammon; and another similar one for the opponent. These two extra nodes let the backgammon output nodes identify more exactly when they should be zero.

In a 10k-game match against Benchmark 1 it averages +0.129ppg and wins 52.4% of the games.

In a 10k-game match against Player 2 it averages +0.007ppg and wins 50.4% of the games.

In a 10k-game match against pubeval it averages +0.426ppg and wins 64.3% of the games. (Note: updated with corrected performance after I fixed the pubeval player.)

Also somewhat surprising that it is essentially equivalent in performance to Player 2, which has the extra race network and uses a one-sided bearoff database. I suspect this means that the race network is not particularly efficient for Player 2.

In addition it does some (fairly obvious) validation on the probabilities that Benchmark 1 did not do: it checks that the estimate of gammon probability is less than or equal to the win probability, that the backgammon probability is less than or equal to the gammon probability, and it properly overrides the network calculations for gammon and backgammon values in the cases where it knows the results (for example, zero probability of gammon loss if the player has already borne off a checker).

Here is a chart of training performance vs Benchmark 1 over 450k training runs. I started with the Benchmark 1 network weights and added small random backgammon->middle weights (uniform random between -0.1 and +0.1). I used alpha=0.1 for the first 200k runs and then dropped to alpha=0.02.

The blue dots are the results of 1,000-game matches against Benchmark 1, and the blue line is the 10-match moving average. It looks reasonably converged after 450k iterations.

It is somewhat surprising that it took so long to converge: I haven't checked, but based on the Benchmark 1 and Player 2 training from scratch, I would have expected training from scratch to converge in around 300k iterations. And starting from a more sensible point (the Benchmark 1 weights) should have made it better. That said, training the backgammon weights is the slowest bit of training since most games do not end in backgammon.

Benchmark 2 also uses 80 hidden nodes and the standard Tesauro inputs (less the two inputs noting whose turn it is, as usual). I added two custom inputs: one which is zero if the player can still be backgammon and one if the player cannot be backgammon; and another similar one for the opponent. These two extra nodes let the backgammon output nodes identify more exactly when they should be zero.

In a 10k-game match against Benchmark 1 it averages +0.129ppg and wins 52.4% of the games.

In a 10k-game match against Player 2 it averages +0.007ppg and wins 50.4% of the games.

In a 10k-game match against pubeval it averages +0.426ppg and wins 64.3% of the games. (Note: updated with corrected performance after I fixed the pubeval player.)

Also somewhat surprising that it is essentially equivalent in performance to Player 2, which has the extra race network and uses a one-sided bearoff database. I suspect this means that the race network is not particularly efficient for Player 2.

## No comments:

## Post a Comment