Sunday, July 11, 2021

The latest GnuBG benchmark files

I found the latest versions of the GnuBG benchmark databases here. They've moved around in the 10y since I last looked for them! The file format remains the same, though I had to rebuild the parser for the 20-character string board descriptor in here.

Remember: the benchmark databases given a bunch of starting boards, rolls, and for each, the top five boards with post-roll equity for each.

There's also a set of training databases that gives a list of boards with pre-roll probabilities for the different game outcomes; that's what we do supervised learning against. Those are here. A description of the contents is here.

These training databases have grown substantially in the intervening years: there are now 1,225,297 entries for contact, 600,895 for crashed; and 516,818 for race.

Rebuilding TD-lambda learning

I tried to get TD-lambda learning to work with scikit-learn's MLPClassifier tools, but couldn't get it to accept probabilities as inputs rather than a small set of categories (1 or 0 values). Then I tried MLPRegressor, but that doesn't seem to have a nice way of making the outputs bounded in (0,1).

So rather than bang my head against that, I just rolled my own neural network again - this time in Python, but using numpy vectorized calculations to speed things up.

It's still pretty slow in execution - I can train a network with 80 hidden nodes at the pace of 20,000 games per hour on my desktop machine. But, it let me get back into the weeds with how this all works.

This time I followed Tesauro's setup a bit more closely, in that the inputs are explicitly white and black checker positions rather than "whichever player's on move", and I kept the two "whose move is it" inputs too. The outputs are: probability of white single win, probability of white gammon, probability of black gammon, probability of white backgammon, and probability of black backgammon. The probability of black single win was equal to one minus the sum of the other probabilities.

I'm able to reproduce most of the initial cubeless play results from my earlier work, though I've yet to add the inputs for the number of checks available to hit, or the Berliner primes. It takes around 200,000 game iterations to train up to something like the Benchmark 2 level. This was using alpha=0.1, lambda=0, and no "alpha splitting" (using a different learning rate for the input->hidden weights and the hidden->output weights).

So now I've convinced myself that I remember how to build one of these players from scratch. For the next step I'm going to download the latest GnuBG benchmark databases and do supervised learning on those - it should be much easier to plug that into an external package like scikit-learn.

Thursday, July 1, 2021

Gerald Tesauro original paper

 For reference: here is a link to the original Tesauro paper on TD-Gammon.

Rebuilding the backgammon framework and pubEval

I need a working bot to tell me the best move once I get the "parse the board state from a photo of a board" bit sorted.

I'm still following the same approach as originally, but this time I'm reimplementing it in Python, not C++. Two reasons for that: first, I enjoy Python coding a whole lot more because it's easier; and second, there are some truly excellent open source machine learning libraries in Python that I'll leverage. Execution speed of course is much slower, which might cause some angst when training, so I might need to do some optimization. Perhaps use Cython or numba or something to speed that up - we'll see.

And a third reason: rebuilding it from scratch will help me remember how all the details work!

In any case I've rebuilt the structure now, and have got a pubEval player to play simple games against. I had to google around a fair bit to find a nice description of pubEval, and finally found one here, where Gerry Tesauro published a post with a C implementation. Basically pubEval is just a linear regression (actually two regressions: one for a race, one for contact) that takes a 122-element representation of the board state and returns a score, with a higher score being better. All you really need then is the board state encoding, plus the linear regression weights for the two regressions, which you can find at that link.

After playing some games manually against pubEval, I can honestly say that it's not very good. It's not terrible, but I now remember why it's pretty easy to beat.

And Happy Canada Day! :)