Another standard technique for improving backgammon bot performance is using "multiple plies". This technique assumes that as you get closer to the end of the game, the accuracy of the net improves. Of course this is true in the limit of the very end of the game, but many games end up with one player or the other in a dominant position well before the game ends.

A regular calculation of board value is called "0-ply" - zero extra steps beyond the current one. Note that the notation is a bit confused here: GnuBG uses 0-ply for this initial calculation, but other backgammon bots describe this as "1-ply" instead. I will continue to follow GnuBG's notation.

1-ply looks forward one step of the game: it looks at each of the possible 21 rolls that the opponent might make and for each chooses the optimal board and weights its (0-ply) value with the probability of getting that roll (1/36 for a double and 1/18 for a mixed roll).

2-ply does the same as the 1-ply calculation, but for each opponent roll it calculates each move's 1-ply value, so recursively looks at all the possible rolls that the player might make after each of the opponent roll scenarios.

"n-ply" does this recursion n times, of course.

This quickly becomes an expensive calculation. There are 21 possible rolls, and on average for each roll there are roughly 20 possible moves that need to be evaluated. So a 1-ply board evaluation costs about 400x as much as 0-ply evaluation. A 2-ply evaluation costs 160,000x as much as 0-ply.

One technique that GnuBG implemented to reduce calculation time is to use a quick & coarse strategy to trim down the number of board evaluations to do at each step. That is, for the (average) 20 possible moves after a roll, it uses a coarse strategy to order their board values, and then does the expensive calculation only on the top few.

This assumes that many of those 20 moves are clearly poor, so even a coarse strategy can effectively discard them and leave the expensive calculation for the moves we expect really are worth being accurate when evaluating.

I built a multiple-ply strategy wrapper that takes a base strategy (used for 0-ply calculations) and a filter strategy (the coarse strategy used to trim out the clear losers among the possible moves). It takes a parameter that defines how many moves maximum it will calculate the expensive recursive board value for, and determines that subset of moves using the filter strategy.

When I try this with Player 2.1 as the base strategy, Player 2.3q as the coarse strategy, and eight as the maximum number of expensive moves, I see a 1-ply version score +0.107ppg in a 5k-money game match against the 0-ply version (Player 2.1), winning 53.6% of the games. It runs 125x slower than the 0-ply version, much better than the 400x we'd see without move filtering, but still quite slow (which is why I ran only 5k games in the match - that took 8h on my multi-core desktop).

I played 20 games by hand against the 1-ply extension of Player 2.1 and it beat me handily: I lost 0.65ppg. Again, I'm not the greatest player in the world - solidly intermediate - but still that's a pretty good result. I lose 0.4ppg against the iPhone Backgammon NJ expert player (which they claim is on par with 0-ply GnuBG), but I use hints there occasionally so my real performance is a little worse than that. There's a fair bit of statistical noise on the score on a 20-game match - a standard error of 0.25ppg or so - but pretty clearly this player is way better than I am.

A more interesting benchmark would be how it plays against humans of various skill levels. There's an online backgammon server called FIBS (the First Internet Backgammon Server) which people have connected bots to before, so I think I'll try that once I'm comfortable that my player is good enough.

Also I can probably speed it up more by filtering not just beyond a fixed number of moves, but also beyond an equity cutoff. So if my move cutoff is eight moves, but four of those are calculated as terrible moves using the coarse filter strategy, I should be able to ignore them.

A regular calculation of board value is called "0-ply" - zero extra steps beyond the current one. Note that the notation is a bit confused here: GnuBG uses 0-ply for this initial calculation, but other backgammon bots describe this as "1-ply" instead. I will continue to follow GnuBG's notation.

1-ply looks forward one step of the game: it looks at each of the possible 21 rolls that the opponent might make and for each chooses the optimal board and weights its (0-ply) value with the probability of getting that roll (1/36 for a double and 1/18 for a mixed roll).

2-ply does the same as the 1-ply calculation, but for each opponent roll it calculates each move's 1-ply value, so recursively looks at all the possible rolls that the player might make after each of the opponent roll scenarios.

"n-ply" does this recursion n times, of course.

This quickly becomes an expensive calculation. There are 21 possible rolls, and on average for each roll there are roughly 20 possible moves that need to be evaluated. So a 1-ply board evaluation costs about 400x as much as 0-ply evaluation. A 2-ply evaluation costs 160,000x as much as 0-ply.

One technique that GnuBG implemented to reduce calculation time is to use a quick & coarse strategy to trim down the number of board evaluations to do at each step. That is, for the (average) 20 possible moves after a roll, it uses a coarse strategy to order their board values, and then does the expensive calculation only on the top few.

This assumes that many of those 20 moves are clearly poor, so even a coarse strategy can effectively discard them and leave the expensive calculation for the moves we expect really are worth being accurate when evaluating.

I built a multiple-ply strategy wrapper that takes a base strategy (used for 0-ply calculations) and a filter strategy (the coarse strategy used to trim out the clear losers among the possible moves). It takes a parameter that defines how many moves maximum it will calculate the expensive recursive board value for, and determines that subset of moves using the filter strategy.

When I try this with Player 2.1 as the base strategy, Player 2.3q as the coarse strategy, and eight as the maximum number of expensive moves, I see a 1-ply version score +0.107ppg in a 5k-money game match against the 0-ply version (Player 2.1), winning 53.6% of the games. It runs 125x slower than the 0-ply version, much better than the 400x we'd see without move filtering, but still quite slow (which is why I ran only 5k games in the match - that took 8h on my multi-core desktop).

I played 20 games by hand against the 1-ply extension of Player 2.1 and it beat me handily: I lost 0.65ppg. Again, I'm not the greatest player in the world - solidly intermediate - but still that's a pretty good result. I lose 0.4ppg against the iPhone Backgammon NJ expert player (which they claim is on par with 0-ply GnuBG), but I use hints there occasionally so my real performance is a little worse than that. There's a fair bit of statistical noise on the score on a 20-game match - a standard error of 0.25ppg or so - but pretty clearly this player is way better than I am.

A more interesting benchmark would be how it plays against humans of various skill levels. There's an online backgammon server called FIBS (the First Internet Backgammon Server) which people have connected bots to before, so I think I'll try that once I'm comfortable that my player is good enough.

Also I can probably speed it up more by filtering not just beyond a fixed number of moves, but also beyond an equity cutoff. So if my move cutoff is eight moves, but four of those are calculated as terrible moves using the coarse filter strategy, I should be able to ignore them.

## No comments:

## Post a Comment