Once again I was burned by subtleties around who's holding the dice.
The rule the strategies use to determine the optimal move given a board layout and dice throw is to look at all the possible moves and choose the one that has the highest equity. BUT: that equity needs to be calculated assuming that the opponent holds the dice, not the player, since after the move the dice switch hands.
I'd been erroneously calculating the equity in those cases still assuming the player holds the dice, so no surprise that the learning wasn't converging properly.
I've fixed that and am now training a player that uses one-sided bearoff databases, a race net, and a contact net. After 31k training runs it's already beating my best-performing previous network by 0.38ppg, so that definitely looks to have been a significant error!
Results coming when I get a decent number of training results complete.
The rule the strategies use to determine the optimal move given a board layout and dice throw is to look at all the possible moves and choose the one that has the highest equity. BUT: that equity needs to be calculated assuming that the opponent holds the dice, not the player, since after the move the dice switch hands.
I'd been erroneously calculating the equity in those cases still assuming the player holds the dice, so no surprise that the learning wasn't converging properly.
I've fixed that and am now training a player that uses one-sided bearoff databases, a race net, and a contact net. After 31k training runs it's already beating my best-performing previous network by 0.38ppg, so that definitely looks to have been a significant error!
Results coming when I get a decent number of training results complete.
No comments:
Post a Comment