The GNUbg team (in particular, Philippe Michel) has created new benchmark databases for Contact, Crashed, and Race layouts, using the same set of board positions but rolling out the equities with more accuracy. This corrects the significant errors found in the old Crashed benchmark, and improves the Contact and Race benchmarks.
They are available for download here, in the benchmarks subdirectory.
Philippe also did some work on improving the board positions included in the Crashed training database, which is available for download in the training_data subdirectory at that link.
I re-ran the statistics for several of my players, as well as for PubEval. Also Player 3.6 as the most comprehensive benchmark.
For the games against PubEval I ran 40k cubeless money games; standard errors are +/- 0.006ppg. Down to Player 3.2, for the games against Player 3.6 I ran 400k cubeless money games to get more accuracy; standard errors are +/- 0.002ppg or better. For players worse than Player 3.2 I played 100k games against Player 3.6 as the average scores were larger; standard errors are +/- 0.004ppg.
Phillippe Michel was gracious enough to provide the GNUbg 0-ply scores against the newly-created benchmarks. Also it seems like I had the scores against the old benchmarks incorrect: they were Contact 10.4, Crashed 7.72, and Race 0.589. The Contact score was close, but the other two I had significantly worse.
They are available for download here, in the benchmarks subdirectory.
Philippe also did some work on improving the board positions included in the Crashed training database, which is available for download in the training_data subdirectory at that link.
I re-ran the statistics for several of my players, as well as for PubEval. Also Player 3.6 as the most comprehensive benchmark.
Player
|
GNUbg Contact ER
|
GNUbg Crashed ER
|
GNUgb Race ER
|
PubEval Avg Ppg
|
Player 3.6 Avg Ppg
|
---|---|---|---|---|---|
GNUbg
|
10.5
|
5.89
|
0.643
|
0.63
|
N/A
|
12.7
|
9.17
|
0.817
|
0.601
|
0.0
| |
13.1
|
9.46
|
0.817
|
0.597
|
-0.0027
| |
13.4
|
9.63
|
0.817
|
0.596
|
-0.0119
| |
13.4
|
9.89
|
0.985
|
0.595
|
-0.0127
| |
14.1
|
10.7
|
2.14
|
0.577
|
-0.041
| |
33.7
|
26.2
|
2.45
|
0.140
|
-0.466
| |
18.2
|
21.7
|
2.05
|
0.484
|
-0.105
| |
21.6
|
23.2
|
5.54
|
0.438
|
-0.173
| |
41.7
|
50.5
|
2.12
|
0.048
|
-0.532
| |
44.2
|
51.3
|
3.61
|
0
|
-0.589
|
Phillippe Michel was gracious enough to provide the GNUbg 0-ply scores against the newly-created benchmarks. Also it seems like I had the scores against the old benchmarks incorrect: they were Contact 10.4, Crashed 7.72, and Race 0.589. The Contact score was close, but the other two I had significantly worse.
No comments:
Post a Comment