Friday, April 6, 2012

Self-play variance reduction

I've done a lot of bot self-play to gather statistics on relative strength and so on.

One obvious way to reduce variance in the results is by running half the results with player 1 vs player 2, then half the results with player 2 vs player 1, but using the same random seeds as with the first half. That is, in both halves the dice are the same, but the players alternate taking them. That significantly reduces the impact of dice luck. In the limit of both players being exactly equal in terms of strategy this will return exactly 0 points per game and 50% wins, so perfect variance reduction.

Note that only half as many independent trials are used; that tends to make the standard error worse. But the variance reduction should more than compensate.

I haven't done this before, which is a little embarrassing, honestly.

But, now that I've thought of it, I figured I'd do a little statistical analysis to investigate how effective the variance reduction actually is.

The first example: playing 10k self-play money games where both players use Player 3.3 for checker play, the first player uses a Janowski doubling strategy with x=0.7, and the second player uses Janowski with x=0.6. In my previous experiments where I ran 1M money games with no variance reduction I got an average score of +0.018ppg with a standard error of 0.003ppg.

No variance reduction: +0.017ppg +/- 0.032ppg
Variance reduction: +0.026 +/- 0.014ppg

Both are consistent with the more accurate estimate, but the standard error with variance reduction was 2.3x smaller. That corresponds to the standard error when running 2.3^2=5.3x as many simulated games with no variance reduction, so saves over a factor of 5 in run time.

The second example: playing 10k self-play cubeless money games, Player 3.3 against PubEval.

No variance reduction: +0.610ppg +/- 0.014ppg
Variance reduction: +0.583 +/- 0.012ppg

In this case the variance reduction doesn't help much. That's likely because the two strategies are sufficiently different that they make different moves, and the same dice throws applied to different positions imply different luck.

So when two players are comparable the variance reduction is quite powerful. When they imply significantly different checker play the technique is less powerful, but still worth doing.

1 comment:

  1. Hi, Mark!

    I think your basic assumption is incorrect. The luck of a dice roll is depending on the position of the board. ... and the position on the board is depending on which moves has been performed earlier in the game.... and which moves that was picked depends on the checker playing abilities of the bot...? This type of variance reduction is therefore unusable if the move selection algorithm is different for the two players. Comparing cube strategies, assuming the cube level and ownership is independent of the move selection, can be utilized this way.

    However: For real variance reduction you can estimate the luck of each roll through the game, add the total luck through the game, and then subtract that from the real outcome of the game. This is the variance reduces result and can be used in the rollouts to get real variance reduction. (Self-play and rollouts are really not different.)

    See also the excellent article by David Montgomery in GammOnLine.