The results after fixing the bug actually are not terribly dissimilar to the ones before, except shifted downward a little in points per game. It is a little surprising how much it looks like it just changed the gammon payout to three from two without affecting the training at all - I would have expected its training path to change a little as well.
Here are the results:
I also added a strategy that is a "human" strategy - ie one that lets me interactively play against the trained neural net strategies. I had it print out the network's estimate of probability of win and gammon at each step in the games I played to get an idea how reasonable they are.
I played a 10-game match against the strongest player - the 80-node network that during training performed best against pubEval (+0.885ppg in the 200-game benchmark match). It actually beat me! I played -0.2ppg against it.
Note: I discovered later on that my pubEval benchmark implementation was wrong; the results above are invalid. PubEval is a much stronger player than it would seem from the numbers above. In any case my network setup also had a significant bug, which I later discovered and fixed.
I'm not a great player. I play around -0.4ppg against the best backgammon bots, with an ER (error rating) around 9. So solidly intermediate according to the tables I've seen.
The 80-node bot isn't perfect - it left a few puzzling shots. For example, it played a 63 off the starting board as 13/4, leaving a shot in its home board right in front of my two back checkers. Its estimate of the probability of a gammon win don't swing to zero when the other player has borne in.
But these are really promising results! I feel quite proud that I've build a bot that game beat me (even though I suspect it was just a little lucky in that match...).
Here are the results:
I also added a strategy that is a "human" strategy - ie one that lets me interactively play against the trained neural net strategies. I had it print out the network's estimate of probability of win and gammon at each step in the games I played to get an idea how reasonable they are.
I played a 10-game match against the strongest player - the 80-node network that during training performed best against pubEval (+0.885ppg in the 200-game benchmark match). It actually beat me! I played -0.2ppg against it.
Note: I discovered later on that my pubEval benchmark implementation was wrong; the results above are invalid. PubEval is a much stronger player than it would seem from the numbers above. In any case my network setup also had a significant bug, which I later discovered and fixed.
I'm not a great player. I play around -0.4ppg against the best backgammon bots, with an ER (error rating) around 9. So solidly intermediate according to the tables I've seen.
The 80-node bot isn't perfect - it left a few puzzling shots. For example, it played a 63 off the starting board as 13/4, leaving a shot in its home board right in front of my two back checkers. Its estimate of the probability of a gammon win don't swing to zero when the other player has borne in.
But these are really promising results! I feel quite proud that I've build a bot that game beat me (even though I suspect it was just a little lucky in that match...).
No comments:
Post a Comment