## Monday, October 24, 2011

### Next steps

I've continued to be frustrated in getting my network to sensibly converge when I include the "whose turn is it" input. I tried a bunch of different approaches but none of them seem to make that much difference:

• Include a new output node for backgammons. I noticed that in the training for networks that contained probability of win and gammon nodes, the realized behavior included a lot of backgammons, and wondered whether this was skewing the results. So I added a new output node which represents the conditional probability of a backgammon win given a gammon win. As with the gammon node, you can use symmetry to imply the probability of a backgammon loss given a gammon loss. This ended up making very little difference.
• Weight the learning rate for the gammon node by the probability of a win. Since the gammon node represents the conditional probability of a gammon given a win, the idea is that very little information is contained in games where the prob of win is low. And a similar thing when we include backgammon nodes: weight its learning rate by the unconditional probability of a gammon win. In practice, neither of these made much difference.
• Instead of always estimating the probability of a gammon from the network, use knowledge of the game to properly set that probability to zero when the opponent has taken off at least one checker. At the same time, stop training the gammon node when the probability of gammon is zero. This basically means that the "end of the game" training that happens for the probability of win node happens (sometimes) earlier in a game. This converged to something non-trivial, in the sense that the gammon (and backgammon) probabilities didn't converge to zero or one; but the converged probabilities were not sensible and the resulting player performed poorly against the benchmark.

I've gone over the code many times with fresh eyes and don't see anything wrong here, so I'm assuming my network is just confused somehow.

I'm going to try it using a few different values for alpha to see whether I've just got too large a learning rate, but I've played with that before and don't hold out much hope.

The next thing I'm going to try is a more sophisticated network structure: include a race network and a bearoff database. I was holding off on doing this before since I wanted to try to solve the convergence problems on a simpler setup, but that's not bearing much fruit. Hopefully there isn't some bug I've missed that moots the more complex setup.