Monday, August 20, 2012

Improved GNUbg benchmarks

The GNUbg team (in particular, Philippe Michel) has created new benchmark databases for Contact, Crashed, and Race layouts, using the same set of board positions but rolling out the equities with more accuracy. This corrects the significant errors found in the old Crashed benchmark, and improves the Contact and Race benchmarks.

They are available for download here, in the benchmarks subdirectory.

Philippe also did some work on improving the board positions included in the Crashed training database, which is available for download in the training_data subdirectory at that link.

I re-ran the statistics for several of my players, as well as for PubEval. Also Player 3.6 as the most comprehensive benchmark.

Player
GNUbg Contact ER
GNUbg Crashed ER
GNUgb Race ER
PubEval Avg Ppg
Player 3.6 Avg Ppg
GNUbg
10.5
5.89
0.643
0.63
N/A
12.7
9.17
0.817
0.601
0.0
13.1
9.46
0.817
0.597
-0.0027
13.4
9.63
0.817
0.596
-0.0119
13.4
9.89
0.985
0.595
-0.0127
14.1
10.7
2.14
0.577
-0.041
33.7
26.2
2.45
0.140
-0.466
18.2
21.7
2.05
0.484
-0.105
21.6
23.2
5.54
0.438
-0.173
41.7
50.5
2.12
0.048
-0.532
44.2
51.3
3.61
0
-0.589

For the games against PubEval I ran 40k cubeless money games; standard errors are +/- 0.006ppg. Down to Player 3.2, for the games against Player 3.6 I ran 400k cubeless money games to get more accuracy; standard errors are +/- 0.002ppg or better. For players worse than Player 3.2 I played 100k games against Player 3.6 as the average scores were larger; standard errors are +/- 0.004ppg.

Phillippe Michel was gracious enough to provide the GNUbg 0-ply scores against the newly-created benchmarks. Also it seems like I had the scores against the old benchmarks incorrect: they were Contact 10.4, Crashed 7.72, and Race 0.589. The Contact score was close, but the other two I had significantly worse.



No comments:

Post a Comment