Most of the players I've built so far make the "symmetry" assumption: that it's good to ensure explicitly that when you flip the perspective of the board (including an input that signifies whose turn it is), the network probabilities transform as they should. That is, probability of any win transforms to probability of loss, probability of gammon win transforms to probability of gammon loss, and so on.
Satisfying the symmetry adds a constraint to the hidden->input weights. The inputs are set up in pairs, where the inputs for the second player mirror the inputs for the first one, just in different positions in the inputs list. The constraint is that the weight for a mirror input must be the negative of the weight for the corresponding input.
There are two benefits to this approach: first, there are roughly half the number of network weights, so training should be faster; and second, a known symmetry is enforced so the network probability estimates should be more accurate in some general way.
The downside, it seems to me, is that there are half the number of weights, so the network is less flexible than it is without the constraint. So perhaps if we do not explicitly enforce the relationship between weights for regular and mirror inputs, the network can find more powerful strategies.
I designed a simple experiment to test out the net benefit of adding the symmetry constraints. This is not conclusive for more powerful networks, but it should be suggestive.
I designed two networks:
- "Normal": a network setup that does not include the symmetry constraint. The inputs also do not include a flag for whose turn it is; the evaluator always assumes "player 1" is on turn.
- "Symmetric": a network setup that does include the symmetry constraint, and also includes two inputs that reflect whose turn it is (1 for player on turn, 0 otherwise).
Both networks contain only one output for probability of win, and are trained in the usual TD approach. The inputs (aside from the variance on the "whose turn is it" inputs) are the original Tesauro inputs. So not particularly sophisticated networks, but pared down to highlight the difference I want to test out.
I trained both networks with constant alpha=0.1 and lambda=0 for 100k iterations, and every 1k iterations I ran benchmarking. Both strategies were benchmarked on 1k games against the pub eval standard, and against each other. Both networks had 10 hidden nodes.
Conclusions and Results
For this simple network setup, adding the symmetry constraint is marginally superior. The Symmetric network trains more quickly than the Normal network and has a comparable final performance against both pub eval and the Normal network in benchmarking games. That is, the Normal network's extra flexibility from a larger set of weights did not seem to have much of a net benefit.
100k iterations is not that much for backgammon training, and perhaps the Normal network would do better on a longer run. These are pretty simple networks that might not be representative of more complex ones. And the benchmarks I tried are simple and do not say much about network performance in more granular aspects of the game.
Nonetheless, this suggests minimally that adding the symmetry constraint does not make the trained player net worse, and a smaller number of network weights should make it more stable and easier to train.
This chart shows the results of the experiment:
The x-axis is the training run. The y-axis is the fraction of games won in the 1k benchmark games, run every 1k training runs. The green points and line show the Normal setup against the pubeval benchmark; the orange points and line show the same for the Symmetric setup. The blue points and line show the results of a matchup between Normal and Symmetric. The points are the individual 1k-game competitions and the lines are a rolling 10-set average.
Note: results against pubeval are incorrect. Also the training was done incorrectly. Updated results are later in the blog; but the conclusion doesn't change. Symmetric is worse than Normal, and by a more significant margin than is clear here.
Note: results against pubeval are incorrect. Also the training was done incorrectly. Updated results are later in the blog; but the conclusion doesn't change. Symmetric is worse than Normal, and by a more significant margin than is clear here.
Both the Symmetric and Normal networks train quite quickly (within 20k training runs or so), though the Symmetric network trains substantially quicker than the Normal one - by 5k training runs it is winning around 70-75% of games against pub eval.
Symmetric does generally better against pub eval than Normal, even after 100k training runs.
Normal tends to lose by a small margin to Symmetric. By the later training runs they are roughly on par when playing against each other.
Neither player is particularly good on an absolute scale, which is as expected with this relatively simple pair of setups.
No comments:
Post a Comment