Computational Backgammon: Extending the outputs

The first network I set up had just one output node, which represents the probability of a win - any kind of win, ignoring the distinction between single wins and gammons/backgammons.

This was a simplification of the original Tesauro work which had four outputs: one for a win, one for a gammon win, one for a loss, and one for a gammon loss. There was no knowledge in the network of backgammons, under the assumption that in practice they happen so rarely that they can be ignored (this is true in regular games between experienced human players).

So the next step in the evolution of my net is to add an understanding of gammons. As with the single-output network, I want the network to do the right thing when you flip the perspective of a board and pass it in. For a multi-output network this means that prob of win->prob of loss, prob of gammon win->prob of gammon loss, and so on.

I started by assuming that the network needs only two more nodes: one representing the probability of a gammon win conditional on a win, and the other representing the probability of a gammon loss given a loss. I define the probabilities as conditional probabilities so that they run from 0->1, which is convenient for the excitation function used for the output value given the inputs. Note that probability of gammon win = probability of win * probability of gammon win conditioned on a win, so all the required probabilities can be calculated from the conditional probabilities.

Note that I leave out the "probability of loss" node, because that can be calculated as 1 - probability of win (since the probability of win output represents the probability of any kind of win).

I started by writing the output value for the two conditional probability nodes in a similar form to the original output node, that is

$\LARGE \bg_white z_i=\frac{1}{1+e^{-\sum_{j=1}^M{v_{ij}y_j}}}$

Then we come to the constraints implied by a flip of the board perspective. Here, the probability of a gammon win conditioned on a win transforms to the probability of a gammon loss conditioned on a loss; so z3 transforms to z2 and vice versa.

We maintain the transformation for the probability of win, which means that yj transforms to 1-yj. So the transformed z3 (z3' below) can be written as

$\LARGE \bg_white z_3'=\frac{1}{1+e^{-\sum_{j=1}^M{v_{3j}} + \sum_{j=1}^M{v_{3j}y_j}}}$

Identifying this with z2 and noting that it must hold for all values of yj (for any j), we come up with the constraints

$\LARGE \bg_white \sum_{j=1}^M{v_{3j}}=0 \\ v_{3j}=-v{2j}$

However, under these constraints, it also turns out that z3->1-z3 - so the probability of a gammon win conditioned on a win is equal to 1 - the probability of a gammon loss conditioned on a loss. This is clearly wrong - eg at the start of a new game, where the probability of a win is roughly 1/2, it implies that the probability of a gammon win is 1/4, which is much too high.

So there is clearly something wrong with the analysis above. The error is in the chosen form of the output as a function of the inputs - it is not general enough given these constraints. So for the two conditional probability nodes we generalize the function a bit to add a "bias weight" - a weight which is multiplied by 1 instead of a hidden node value. That is,

$\LARGE \bg_white z_i=\frac{1}{1+e^{-\sum_{j=1}^M{v_{ij} y_j}-\phi_i}}$

Applying the same constraint from symmetry we get the proper constraints on the parameters:

$\LARGE \bg_white v_{3j}=-v_{2j} \\ \phi_3=\phi_2+\sum_{j=1}^M{v_{2j}}$

Note that in the case of the first output node - the probability of a win, which has no bias weight - we require that the sum of the weights equal zero, which means we need only M-1 parameters.

Here, however, there is no constraint that the weights sum to zero, and we have added another weight - the bias weight. So for the two conditional gammon nodes there are M+1 parameters.

One extra (significant) simplification: the neural net does not need to know anything about the third output node, the probability of a gammon loss conditioned on a loss. That probability can be calculated from the parameters of the second node, as per the parameter constraints above.

So the neural net has only two output nodes: one representing the probability of (any kind of) a win, and the other representing the probability of a gammon win conditioned on a win. The expected number of points won given a board can be determined from those two numbers and the weights on the second node.

Computational Backgammon

Tuesday, August 2, 2011

Extending the outputs

No comments:

Post a Comment

About Me