The multiple ply calculations are now a little faster than before, and I'm able to get some preliminary stats on 2-ply performance.

I ran the 2-ply version of Player 2.4 for 150 cubeless money games against PubEval. It scored +0.45ppg +/- 0.11ppg. (+/- as usual is the standard error estimate.)

150 games gives a large standard error, so to tie down its performance vs 1-ply a bit better, I ran the 1-ply version against PubEval as well for the same 150 games - that is, starting them with the same random number generator seed.

This isn't perfect since the play is different and the number of steps in a game will deviate between the two players; but it should reduce the variance substantially. So the difference in performance between 2-ply and 1-ply against the same PubEval player should have much less uncertainty than the absolute equity numbers.

The 1-ply player scored 0.38ppg in the same games.

So the 2-ply player is approximately 0.07ppg stronger than 1-ply.

That seems plausible: the 1-ply player is 0.11ppg stronger than 0-ply, so having a second ply increase performance by something like 0.1ppg doesn't strike me as unreasonable.

I suspect (with no solid argument) that this gives a better low-variance estimate of the relative performance than playing 2-ply directly against 1-ply, since this way both the players get (largely) the same rolls.

I ran the 2-ply version of Player 2.4 for 150 cubeless money games against PubEval. It scored +0.45ppg +/- 0.11ppg. (+/- as usual is the standard error estimate.)

150 games gives a large standard error, so to tie down its performance vs 1-ply a bit better, I ran the 1-ply version against PubEval as well for the same 150 games - that is, starting them with the same random number generator seed.

This isn't perfect since the play is different and the number of steps in a game will deviate between the two players; but it should reduce the variance substantially. So the difference in performance between 2-ply and 1-ply against the same PubEval player should have much less uncertainty than the absolute equity numbers.

The 1-ply player scored 0.38ppg in the same games.

So the 2-ply player is approximately 0.07ppg stronger than 1-ply.

That seems plausible: the 1-ply player is 0.11ppg stronger than 0-ply, so having a second ply increase performance by something like 0.1ppg doesn't strike me as unreasonable.

I suspect (with no solid argument) that this gives a better low-variance estimate of the relative performance than playing 2-ply directly against 1-ply, since this way both the players get (largely) the same rolls.