Thursday, September 1, 2011

Ah ha!

I finally cracked the nut on the problem with the "whose turn is it" input.

The problem was how I trained the board for the t+1 state - ie updating the probabilities after the player moves.

I was evaluating the network without changing who owns the dice. So not surprisingly: if a player always holds the dice, they will definitely win!

Now that I made it flip the dice for the t+1 evaluation, it seems to be properly converging, at least in initial tests.

Phew - that one was bugging me for several days.

1 comment:

  1. Not quite: I still wasn't do the right thing when evaluating the equity of possible moves to choose the best one. In that case I was assuming the player still holds the dice, which is wrong; after the move, the opponent holds the dice.