Thursday, January 19, 2012

PubEval Benchmark Fixed

Found the bug.

After the fix, the Benchmark 2 player (with 80 nodes) scored +0.459ppg and won 65.3% of the games in a 10k-game match.

This is a result that seems fairly plausible. It's significantly less than 0-ply gnubg, and roughly where I read other networks have reached after a similar amount of TD training.

One note: whenever I quote a score, it's always for cubeless money games. So no doubling, but gammons count as two points and backgammons as three. Win percentage is for those same games, but counting any win as 1.

  1. The fx here was correcting the sign of the input corresponding to the number of opponent checkers on the bar. But there was still one bug remaining that had a smaller impact: I was choosing race or contact weights based on the board being evaluated rather than the starting board.