Computational Backgammon: Improved GNUbg benchmarks

The GNUbg team (in particular, Philippe Michel) has created new benchmark databases for Contact, Crashed, and Race layouts, using the same set of board positions but rolling out the equities with more accuracy. This corrects the significant errors found in the old Crashed benchmark, and improves the Contact and Race benchmarks.

They are available for download here, in the benchmarks subdirectory.

Philippe also did some work on improving the board positions included in the Crashed training database, which is available for download in the training_data subdirectory at that link.

I re-ran the statistics for several of my players, as well as for PubEval. Also Player 3.6 as the most comprehensive benchmark.

Player	GNUbg Contact ER	GNUbg Crashed ER	GNUgb Race ER	PubEval Avg Ppg	Player 3.6 Avg Ppg
GNUbg	10.5	5.89	0.643	0.63	N/A
Player 3.6	12.7	9.17	0.817	0.601	0.0
Player 3.5	13.1	9.46	0.817	0.597	-0.0027
Player 3.4	13.4	9.63	0.817	0.596	-0.0119
Player 3.3	13.4	9.89	0.985	0.595	-0.0127
Player 3.2	14.1	10.7	2.14	0.577	-0.041
Player 3.2q	33.7	26.2	2.45	0.140	-0.466
Player 2.4	18.2	21.7	2.05	0.484	-0.105
Benchmark 2	21.6	23.2	5.54	0.438	-0.173
PubEval (ListNet)	41.7	50.5	2.12	0.048	-0.532
PubEval	44.2	51.3	3.61	0	-0.589

For the games against PubEval I ran 40k cubeless money games; standard errors are +/- 0.006ppg. Down to Player 3.2, for the games against Player 3.6 I ran 400k cubeless money games to get more accuracy; standard errors are +/- 0.002ppg or better. For players worse than Player 3.2 I played 100k games against Player 3.6 as the average scores were larger; standard errors are +/- 0.004ppg.

Phillippe Michel was gracious enough to provide the GNUbg 0-ply scores against the newly-created benchmarks. Also it seems like I had the scores against the old benchmarks incorrect: they were Contact 10.4, Crashed 7.72, and Race 0.589. The Contact score was close, but the other two I had significantly worse.

Computational Backgammon

Monday, August 20, 2012

Improved GNUbg benchmarks

No comments:

Post a Comment

About Me