Wednesday, January 11, 2012

More on One-Sided Bearoff Databases

In my last post I mentioned that, when the bearoff database runs out to the full fifteen checkers, you need to build a second database to track the distribution of number of steps to take off the first checker, which is used in approximating the probability of a gammon win or loss.

A few more words on this.

First, my initial guess was that this database would be much smaller than the regular one-sided bearoff database (which tracks the distribution of the number of steps to take off the final checker), because it seemed like there should far fewer positions that are relevant. In fact, that's not the case: the number of possible board layouts of fifteen checkers is large and contains many layouts where there is more than one step to taking off the first checker.

For example, for a database covering nine points and fifteen checkers, there are 1.3M board layouts in the regular one-sided database. After trimming it down to just the ones relevant for the gammon bearoff database, there are 490k layouts. Not as big, but more than 1/3 the size of the full set of layouts. The ratio of data in the two databases is actually smaller than the ratio of board layouts, however, because many of the gammon db entries have only a few entries in their probability distribution, whereas the bearoff database is close to the maximum 10 probability entries for most of the layout entries. (10 is the maximum number of points I chose to track in the probability distributions, which I estimated based on impact on the approximate equity calculations vs the exact equity calculations in the two-sided database. Anything over 8 entries did not affect the approximation for that smaller parameter space.) The file containing the gammon bearoff data is 18% of the size of the regular one.

The other point worth mentioning is that a lot of the layouts tracked in the gammon database are not ones that would reasonably come up in a game - for example, a few checkers in the home board and the rest piled up on the points just outside the home board.

That's true of many of the entries in the regular one-sided database as well. I suspect I could trim down the size of both database significantly by running many test games using a reasonable player and including only database entries that are accessed more than 1 in every 1,000 games or something like that. Occasionally in a real game we'd hit a layout that's not cached in the database, and in those rare cases we'd just calculate it on the fly.

Also, currently I store the data in a hash of string representation of board->probabilities. I suspect a fair amount of space is taken up by the string names and the hash overhead, and I'm wondering whether there's a nice way to map the board layout to an integer, where the integers run from 1->max number of board layouts without any gaps or overlaps. Then I could store the data as a standard vector and hopefully reduce memory usage.

1 comment:

  1. Hi,

    "and I'm wondering whether there's a nice way to map the board layout to an integer, "

    Yes, it is! Contact me, Øystein, by email if you want to know how.

    -Øystein

    ReplyDelete