The two-sided bearoff database I put together before is beautiful because it is exact; but the size of the database quickly becomes cumbersome. In practice I've been using a database for up to six points and nine checkers, which already needs 580MB of disk space to store and around 2GB of RAM to pull in-process. That limit on the number of points and checkers means it does not get used for much of each game and does not help performance significantly.
An alternative, which is an approximation but uses a lot less memory, is to use a one-sided bearoff database. The number of positions in a two-sided database is the square of the number in a one-sided database, so a one-sided database is significantly smaller.
To give a sense of scale: a two-sided database for 6 points and 9 checkers has 25M positions. A one-sided database for 6 points and 9 checkers has 5k positions. A one-sided database for 10 points and 15 checkers has 3.3M positions; assuming roughly 10 probabilities to define the distribution for each position (I limit the distribution to a maximum of 10 points), that's around 30M numbers, so roughly the same amount of memory as the two-sided 6x9 database. That larger parameter set should cover a significant chunk of race positions, and also gives us a benchmark to directly train a race network against using supervised learning instead of TD learning (for some significant fraction of race positions).
The idea here: we look at just one player's position on the board and ignore the other player, and build up the full probability distribution of number of steps to take off all the checkers.
The approximation: when calculating the probability distribution we need to make some assumption about how to make the optimal move. In reality the optimal move will depend on the other player's layout as well, but we approximate the optimal move as the one that minimizes the expected number of steps to finish. That approximation makes the optimal move independent of the opponent's position, which lets us have a much smaller database.
Once we have the probability distribution of both players we can calculate the probability that one of the players wins the game: if that player is on move, it's the probability that their number of steps to finish is less than or equal to the opponent's number of steps. Those two probability distributions are independent because they're generated from independent dice throws, so we can write the probability of player winning the game, as seen before their roll, as:
Prob of Player Win = Sum[ PO(i) Sum[ PP(j), { j, 1, i } ], { i, 1, Infinity } ]
where PP(i) is the probability of the player taking off the last checker in i steps, and PO(i) is the probability of the opponent taking off their last checker in i steps.
Again, this is an approximation based on the assumption that, when generating the distribution of number of steps, we can choose the optimal move based on minimizing the expected number of steps to finish. We can test that approximation against the two-sided database for that smaller parameter space.
I compared the one-sided equity approximation against the exact two-sided equity for up to six points and nine checkers in all of the 9M two-sided positions. The average equity difference was +0.00019; the standard deviation of the difference was 0.0024; and the maximum and minimum differences were +0.023 and -0.026 respectively. So most of the time the approximation is accurate to within 0.003ppg, which is quite good. The average difference is statistically significant, so in principle there is a bias but in practice it is so small as to be irrelevant.
This all assumes that the probability of gammon and backgammon are zero, so that the equity is the usual 2 * prob of win - 1.Ignoring the probability of backgammon is fine since we will not build even a one-sided database that includes points out to the opponent's home board; but the probability of gammon is an important one to track.
We can generalize by building another one-sided database that tracks the distribution of the number of steps before the first checker is taken in. The size of that database will be significantly smaller than the regular one-sided database since we care only about positions that have checkers outside the player's home board and where there are exactly 15 player checkers on the board.
I haven't checked the literature yet on this, so maybe there's a more elegant way to construct a one-sided database with a superior approximation to the one I described above.
An alternative, which is an approximation but uses a lot less memory, is to use a one-sided bearoff database. The number of positions in a two-sided database is the square of the number in a one-sided database, so a one-sided database is significantly smaller.
To give a sense of scale: a two-sided database for 6 points and 9 checkers has 25M positions. A one-sided database for 6 points and 9 checkers has 5k positions. A one-sided database for 10 points and 15 checkers has 3.3M positions; assuming roughly 10 probabilities to define the distribution for each position (I limit the distribution to a maximum of 10 points), that's around 30M numbers, so roughly the same amount of memory as the two-sided 6x9 database. That larger parameter set should cover a significant chunk of race positions, and also gives us a benchmark to directly train a race network against using supervised learning instead of TD learning (for some significant fraction of race positions).
The idea here: we look at just one player's position on the board and ignore the other player, and build up the full probability distribution of number of steps to take off all the checkers.
The approximation: when calculating the probability distribution we need to make some assumption about how to make the optimal move. In reality the optimal move will depend on the other player's layout as well, but we approximate the optimal move as the one that minimizes the expected number of steps to finish. That approximation makes the optimal move independent of the opponent's position, which lets us have a much smaller database.
Once we have the probability distribution of both players we can calculate the probability that one of the players wins the game: if that player is on move, it's the probability that their number of steps to finish is less than or equal to the opponent's number of steps. Those two probability distributions are independent because they're generated from independent dice throws, so we can write the probability of player winning the game, as seen before their roll, as:
Prob of Player Win = Sum[ PO(i) Sum[ PP(j), { j, 1, i } ], { i, 1, Infinity } ]
where PP(i) is the probability of the player taking off the last checker in i steps, and PO(i) is the probability of the opponent taking off their last checker in i steps.
Again, this is an approximation based on the assumption that, when generating the distribution of number of steps, we can choose the optimal move based on minimizing the expected number of steps to finish. We can test that approximation against the two-sided database for that smaller parameter space.
I compared the one-sided equity approximation against the exact two-sided equity for up to six points and nine checkers in all of the 9M two-sided positions. The average equity difference was +0.00019; the standard deviation of the difference was 0.0024; and the maximum and minimum differences were +0.023 and -0.026 respectively. So most of the time the approximation is accurate to within 0.003ppg, which is quite good. The average difference is statistically significant, so in principle there is a bias but in practice it is so small as to be irrelevant.
This all assumes that the probability of gammon and backgammon are zero, so that the equity is the usual 2 * prob of win - 1.Ignoring the probability of backgammon is fine since we will not build even a one-sided database that includes points out to the opponent's home board; but the probability of gammon is an important one to track.
We can generalize by building another one-sided database that tracks the distribution of the number of steps before the first checker is taken in. The size of that database will be significantly smaller than the regular one-sided database since we care only about positions that have checkers outside the player's home board and where there are exactly 15 player checkers on the board.
I haven't checked the literature yet on this, so maybe there's a more elegant way to construct a one-sided database with a superior approximation to the one I described above.
No comments:
Post a Comment