Arg - another bug. The game framework was accidentally calling almost all gammons backgammons.
That's most likely why the pubEval comparison was so high compared to the published results, and why so many of the simulation runs ended in backgammons.
Again this invalidates all the previous results, because the networks were basically trained to treat a gammon as three points instead of two. Perhaps that's why the trained networks left shots more often than they should have - they were too incentivized to try for a gammon (which it scored as three points instead of two).
So... bug fixed, and off we go again to re-train the networks. <Sigh>
That's most likely why the pubEval comparison was so high compared to the published results, and why so many of the simulation runs ended in backgammons.
Again this invalidates all the previous results, because the networks were basically trained to treat a gammon as three points instead of two. Perhaps that's why the trained networks left shots more often than they should have - they were too incentivized to try for a gammon (which it scored as three points instead of two).
So... bug fixed, and off we go again to re-train the networks. <Sigh>
No comments:
Post a Comment