Monday, December 12, 2011

The symmetry constraint is not good

I ran the test described in the previous post out to 400k training runs, and the results were relatively unchanged. In head to head competitions the Normal network started to edge out the Symmetric one, and they started to comparably against the pub eval benchmark, with Symmetric still performing a little better on average.

However, I did some more granular benchmarking and Symmetric did much worse. I looked at the network estimate of probability of win from a pair of end games: one where white is almost sure to win, and one where white is almost sure to lose. Both networks started out estimating both probabilities near 50%, as expected with random weights. But the Normal network over time correctly identified the near-certain win and loss fairly accurately after some tens of thousands of training runs, and the Symmetric network did not. In the certain-to-win case it converged to around 97.5% chance of win, so not far off (but should be very close to 100%); but in the certain-to-lose case it converged to around 20% chance of win, which is way off.

Also, I spent some more time thinking about the basic implications of the Symmetric constraint on the hidden->input weights. In particular: the network probability estimates can ever only depend on the difference between the white and black layouts, not on either one independently. That does not seem to afford sufficient flexibility high-level to the network in making probability estimates.

There might be other ways to enforce the symmetry constraint, but at this point I'm just going to drop the whole thing and follow the same approach as GNU backgammon: always evaluate the board from the perspective of the person holding the dice, and drop the input that represents whose turn it is.

Sunday, December 11, 2011

Testing the value of the symmetry constraint

Most of the players I've built so far make the "symmetry" assumption: that it's good to ensure explicitly that when you flip the perspective of the board (including an input that signifies whose turn it is), the network probabilities transform as they should. That is, probability of any win transforms to probability of loss, probability of gammon win transforms to probability of gammon loss, and so on.

Satisfying the symmetry adds a constraint to the hidden->input weights. The inputs are set up in pairs, where the inputs for the second player mirror the inputs for the first one, just in different positions in the inputs list. The constraint is that the weight for a mirror input must be the negative of the weight for the corresponding input.

There are two benefits to this approach: first, there are roughly half the number of network weights, so training should be faster; and second, a known symmetry is enforced so the network probability estimates should be more accurate in some general way.

The downside, it seems to me, is that there are half the number of weights, so the network is less flexible than it is without the constraint. So perhaps if we do not explicitly enforce the relationship between weights for regular and mirror inputs, the network can find more powerful strategies.

I designed a simple experiment to test out the net benefit of adding the symmetry constraints. This is not conclusive for more powerful networks, but it should be suggestive.

I designed two networks:
  • "Normal": a network setup that does not include the symmetry constraint. The inputs also do not include a flag for whose turn it is; the evaluator always assumes "player 1" is on turn.
  • "Symmetric": a network setup that does include the symmetry constraint, and also includes two inputs that reflect whose turn it is (1 for player on turn, 0 otherwise).
Both networks contain only one output for probability of win, and are trained in the usual TD approach. The inputs (aside from the variance on the "whose turn is it" inputs) are the original Tesauro inputs. So not particularly sophisticated networks, but pared down to highlight the difference I want to test out.

I trained both networks with constant alpha=0.1 and lambda=0 for 100k iterations, and every 1k iterations I ran benchmarking. Both strategies were benchmarked on 1k games against the pub eval standard, and against each other. Both networks had 10 hidden nodes.

Conclusions and Results

For this simple network setup, adding the symmetry constraint is marginally superior. The Symmetric network trains more quickly than the Normal network and has a comparable final performance against both pub eval and the Normal network in benchmarking games. That is, the Normal network's extra flexibility from a larger set of weights did not seem to have much of a net benefit.

100k iterations is not that much for backgammon training, and perhaps the Normal network would do better on a longer run. These are pretty simple networks that might not be representative of more complex ones. And the benchmarks I tried are simple and do not say much about network performance in more granular aspects of the game.

Nonetheless, this suggests minimally that adding the symmetry constraint does not make the trained player net worse, and a smaller number of network weights should make it more stable and easier to train.

This chart shows the results of the experiment:

The x-axis is the training run. The y-axis is the fraction of games won in the 1k benchmark games, run every 1k training runs. The green points and line show the Normal setup against the pubeval benchmark; the orange points and line show the same for the Symmetric setup. The blue points and line show the results of a matchup between Normal and Symmetric. The points are the individual 1k-game competitions and the lines are a rolling 10-set average.

Note: results against pubeval are incorrect. Also the training was done incorrectly. Updated results are later in the blog; but the conclusion doesn't change. Symmetric is worse than Normal, and by a more significant margin than is clear here.

Both the Symmetric and Normal networks train quite quickly (within 20k training runs or so), though the Symmetric network trains substantially quicker than the Normal one - by 5k training runs it is winning around 70-75% of games against pub eval.

Symmetric does generally better against pub eval than Normal, even after 100k training runs.

Normal tends to lose by a small margin to Symmetric. By the later training runs they are roughly on par when playing against each other.

Neither player is particularly good on an absolute scale, which is as expected with this relatively simple pair of setups. 

Friday, December 9, 2011

I'm baaack

It's been a while, but I'm back onto the computational backgammon hobby. I took the free Stanford machine learning course, which was a really interesting high level overview of many machine learning techniques, including neural networks. In the end I don't think it much changes my approach here even if it did give me some interesting context.

The next step: generalize to a more complex evaluator of the board value. I'll use a bearoff database in the appropriate end game, and separate neural networks for different game phases. To start with I'll just use  two, race and contact networks. But the framework will be generic so I can define however many networks I like, along with a generic way to decide which network (or other technique like bearoff) is appropriate for a given stage of the game.

I never did figure out why the previous version didn't converge properly, but I'm going to try a fresh implementation to see if there was some hidden bug. And hopefully the bearoff bit will help it get more sensible results near the end of a (race) game.