I've finally got a bit of free time to spend on the backgammon bot again, so I'm trying two quite different next steps.
The first is a bit of a tweak on what was already there. It came up because I noticed something unappealing about my current setup. The second output node is the probability of a gammon win conditioned on a win. When a game is over, I set that to 1 if the player wins a gammon (or backgammon). However, I would set it exactly the same way if the second node represented the unconditional probability of a gammon, which seems a bit weird.
One solution to this is to ignore the second node altogether (for training its weights or the middle weights) in the case where the player loses the game. But that seems a bit wasteful. In any case, I wasn't ignoring the second node before in losses, and I think that was incorrect. Perhaps that was why many of my trained networks ended up with artificially large estimates for probability of gammon.
I can do better than just ignoring those games though: if the player loses, instead of setting the "true" value of the second node to zero and training on that, I train on the conditional probability of a gammon loss conditioned on a loss. That's a kind of virtual third node in my setup; its value is entirely determined by the weights of the second node, to satisfy symmetry on flipping the board perspective.
I'm not sure how big a difference that will make, but I'm running some tests now to see.
The second change is more significant: I am adding a second neural network that is used in races, in addition to the one that's already there which I use in contact games. Those two game regimes are significantly different, and all the literature I've read suggests that generalizing to different networks for different qualitative game phases improves things significantly. (This version also includes the train-on-gammon-loss change described above.)
Hopefully I'll have some results in a couple of days for both extensions.
The first is a bit of a tweak on what was already there. It came up because I noticed something unappealing about my current setup. The second output node is the probability of a gammon win conditioned on a win. When a game is over, I set that to 1 if the player wins a gammon (or backgammon). However, I would set it exactly the same way if the second node represented the unconditional probability of a gammon, which seems a bit weird.
One solution to this is to ignore the second node altogether (for training its weights or the middle weights) in the case where the player loses the game. But that seems a bit wasteful. In any case, I wasn't ignoring the second node before in losses, and I think that was incorrect. Perhaps that was why many of my trained networks ended up with artificially large estimates for probability of gammon.
I can do better than just ignoring those games though: if the player loses, instead of setting the "true" value of the second node to zero and training on that, I train on the conditional probability of a gammon loss conditioned on a loss. That's a kind of virtual third node in my setup; its value is entirely determined by the weights of the second node, to satisfy symmetry on flipping the board perspective.
I'm not sure how big a difference that will make, but I'm running some tests now to see.
The second change is more significant: I am adding a second neural network that is used in races, in addition to the one that's already there which I use in contact games. Those two game regimes are significantly different, and all the literature I've read suggests that generalizing to different networks for different qualitative game phases improves things significantly. (This version also includes the train-on-gammon-loss change described above.)
Hopefully I'll have some results in a couple of days for both extensions.
No comments:
Post a Comment