I haven't been able to spend much time on this recently, but what I'm trying now is a variation on the symmetric setup - this time adding a new input that represents whose turn it is (=1 for the player's turn and 0 for the other player's turn).

I'm running up against another case where the network wants to converge an output probability to 100% almost all the time. This time the probability of win from the starting board seems to be okay (it settles around 53-55%, so a bit high compared to the gnubg standard, but nothing crazy) but the conditional probability of gammon win converges to 100%.

The last time I saw this, the problem was that I was not properly handing the dice off to the other player at the end of the turn (when evaluating the post-move probabilities). I do that correctly now, but the gammon probability is still messing up.

Frustrating! I'm not even sure how to debug this sensibly. It's a huge complex nonlinear network, so looking at the weights doesn't give me any intuition about what's happening. I've gone over the code many times now and can't see any obvious bugs. The slide toward 100% probability happens gradually over thousands of iterations.

I'm running up against another case where the network wants to converge an output probability to 100% almost all the time. This time the probability of win from the starting board seems to be okay (it settles around 53-55%, so a bit high compared to the gnubg standard, but nothing crazy) but the conditional probability of gammon win converges to 100%.

The last time I saw this, the problem was that I was not properly handing the dice off to the other player at the end of the turn (when evaluating the post-move probabilities). I do that correctly now, but the gammon probability is still messing up.

Frustrating! I'm not even sure how to debug this sensibly. It's a huge complex nonlinear network, so looking at the weights doesn't give me any intuition about what's happening. I've gone over the code many times now and can't see any obvious bugs. The slide toward 100% probability happens gradually over thousands of iterations.

Two comments here: the symmetric approximation is bad from first principles since the output can depend only on the difference in inputs btw the player and opponent, not on either one directly. This isn't flexible enough. Also, the training problems were due to not assuming the opponent holds the dice when evaluating the equity of possible moves.

ReplyDelete