In my training I've been seeing very high win percentages against the pubeval benchmark - north of 80% for most of my benchmarks, and north of +0.9ppg in average score.
This struck me as somewhat suspicious, since whatever information I could find online, it seems like no strategy does as well as my results.
Joseph Heled, one of the brains behind gnubg, was kind enough to run a benchmark on his end of 0-ply gnubg against their implementation of pubeval. In 10k games it scored +0.6296ppg and won 70.87% of the games.
So there's clearly something wrong with what I'm doing; no way my players are anywhere near as good as gnubg. Nikos Papahristou and Joseph both calculated the pubeval value of possible moves after a 3-1 roll and they agree with my pubeval implementation in that case, so there must be something else wrong. Or perhaps there's something awry with how I'm running games; I'm being a bit lax with how I re-seed the RNG independent games. That may be affecting the stats, though I have a hard time seeing how it'll add such a big positive bias.
Anyways: all my pubeval comparison results are probably wrong.
This struck me as somewhat suspicious, since whatever information I could find online, it seems like no strategy does as well as my results.
Joseph Heled, one of the brains behind gnubg, was kind enough to run a benchmark on his end of 0-ply gnubg against their implementation of pubeval. In 10k games it scored +0.6296ppg and won 70.87% of the games.
So there's clearly something wrong with what I'm doing; no way my players are anywhere near as good as gnubg. Nikos Papahristou and Joseph both calculated the pubeval value of possible moves after a 3-1 roll and they agree with my pubeval implementation in that case, so there must be something else wrong. Or perhaps there's something awry with how I'm running games; I'm being a bit lax with how I re-seed the RNG independent games. That may be affecting the stats, though I have a hard time seeing how it'll add such a big positive bias.
Anyways: all my pubeval comparison results are probably wrong.
Sorted: I had two main bugs with my pubEval implementation. The first was the wrong sign on the input for the # of opponent checkers on the bar. The second was determining whether to use the race or contact weights based on the board being evaluated instead of the starting board.
ReplyDeleteThe first one was the biggest problem, but the second one had a noticeable impact as well.