I haven't been able to spend much time on this recently, but what I'm trying now is a variation on the symmetric setup - this time adding a new input that represents whose turn it is (=1 for the player's turn and 0 for the other player's turn).
I'm running up against another case where the network wants to converge an output probability to 100% almost all the time. This time the probability of win from the starting board seems to be okay (it settles around 53-55%, so a bit high compared to the gnubg standard, but nothing crazy) but the conditional probability of gammon win converges to 100%.
The last time I saw this, the problem was that I was not properly handing the dice off to the other player at the end of the turn (when evaluating the post-move probabilities). I do that correctly now, but the gammon probability is still messing up.
Frustrating! I'm not even sure how to debug this sensibly. It's a huge complex nonlinear network, so looking at the weights doesn't give me any intuition about what's happening. I've gone over the code many times now and can't see any obvious bugs. The slide toward 100% probability happens gradually over thousands of iterations.
I'm running up against another case where the network wants to converge an output probability to 100% almost all the time. This time the probability of win from the starting board seems to be okay (it settles around 53-55%, so a bit high compared to the gnubg standard, but nothing crazy) but the conditional probability of gammon win converges to 100%.
The last time I saw this, the problem was that I was not properly handing the dice off to the other player at the end of the turn (when evaluating the post-move probabilities). I do that correctly now, but the gammon probability is still messing up.
Frustrating! I'm not even sure how to debug this sensibly. It's a huge complex nonlinear network, so looking at the weights doesn't give me any intuition about what's happening. I've gone over the code many times now and can't see any obvious bugs. The slide toward 100% probability happens gradually over thousands of iterations.