Computational Backgammon

Sunday, December 10, 2023

Using PyTorch for the neural networks

I've been thinking a bit about whether GPUs are useful for neural network calculations with backgammon bots. It's not totally clear: there is overhead in shipping memory to the GPUs as a cost, measured against the benefits of parallelization.

That said, it is not very hard to check: there are nice open source packages available for neural networks that already support GPU calculations.

A simple one that seems decent for our purposes is PyTorch. This is a Python package for representing neural networks, and, as desired, it supports GPUs through CUDA (which works if you've got an NVIDIA GPU, as I do on my desktop machine).

There are two possible benefits of the GPU. First, evaluation of the network might be faster - so that it's quicker to do bot calculations during a game. Second, the training might be faster - that doesn't show up in the post-training game play, but could let you experiment more easily during training with different network sizes and configurations.

For the training calculations, the GPU really only helps with supervised learning - for example, training against the GNUBG training examples - but not with TD learning. That's because TD learning requires you to simulate a game to get the target to train against - you don't know the set of inputs and targets as training data before the TD training starts.

I constructed a simple feed forward neural network with the same shape as the ones I've discussed before: 196 inputs (the standard Tesauro inputs with no extensions), 120 hidden nodes (as a representative example), and 5 output nodes.

Then I just timed the calculation of network outputs given the inputs. The PyTorch version was noticeably slower even than my hand-rolled Python neural network class (which uses numpy vectorization to speed it up - not as fast as native C++ code but with a factor of 2-3). This is true even though the PyTorch networks do their calculations in C++ themselves. These calculations didn't use the GPU, just the CPU, in serial.

Then I tried the PyTorch calculations parallelizing on the GPU. I've got an NVIDIA GeForce GTX 1060 3GB GPU in my desktop machine, with 1,152 CUDA cores. It was slower, with this number of hidden nodes, than the CPU version of the PyTorch calculation. So the memory transfer overhead outweighed the parallelization benefits in this case.

I tried it with a larger network to see what happened - as the number of hidden nodes goes up, the PyTorch evaluations start to outperform my numpy-based evaluations, especially when using the GPU

So for post-training bot evaluations, it doesn't seem like using the GPU will give much improvement, if we're using nets of the size we're used to.

The GPU does, however, make a massive difference when training! In practice it's running about 10-20x faster than the serial version of the supervised training against the GNUBG training data. I'm excited about this one. And I can always take the trained weights off the PyTorch-based network and paste them onto the numpy-based one, which executes faster post training.

Monday, December 4, 2023

Custom GPTs can run only Python code

I just discovered that OpenAI's custom GPTs cannot run compiled code like C++ - just Python.

That's an interesting limitation of these things, and one that I suspect will disappear in the not-so-distant future, probability to do with how simple the temporary environments need to be that execute the code.

In any case, it means my backgammon bots - for now! - will need to be pure Python. That said, the Python environment does have most of the standard numerical Python packages like numpy and pandas, and more advanced packages like scipy and sklearn that themselves contain some machine learning functionality. So maybe I'll try to build an sklearn-based neural net and see if that works better than my hand-rolled one that uses numpy vectorization.

Sunday, December 3, 2023

The tutorbot begins to take shape

I've named my custom GPT the "Backgammon Tutorbot".

It's now getting a bit better. You can tell it a position in text, like "show me the starting position", and it knows what that means, shows you a proper image of the position, and internally knows what checker layout it corresponds to.

You can then (slowly!) step through a game. If you ask it, for example, "show me the position after a 5-1", it'll call the backgammon bot Python code to figure out the best move for a 5-1, then show you an image of the resulting position, with a bit of commentary. And it remembers the new position as the current one.

Next, if you say "show me the position after the opponent rolls a 6-2", it'll figure out the opponent's best 6-2, then show you an image of that position, with some commentary.

And so on, until the game is over (or you hit the GPT-4 time cap, which I did a bunch).

So it's getting better at the mechanics of representing and advancing a real game. It's quite interesting to use the chat interface as the UI rather than a traditional application. In some ways it's much slower - but it's also chatting in regular English, with a lot of flexibility about how you actually ask your questions. The chatbot is very good at dealing with that kind of flexibility.

But it's still not very good at commentary yet. I've tried uploading one doc with human discussions of opening moves, but that's very early still, and no results to report.

Thursday, November 30, 2023

A backgammon tutor chatbot

I'm trying a new project to get myself familiar with LLMs and the growing infrastructure around them.

In particular, I want to try to solve one of the biggest practical problems that people have when they learn the game through bot analysis (like XG): the bot tells you which move has the highest equity, but it doesn't tell you why that move is the best in terms of more qualitative strategic and tactical decisions.

For example, why is making the 5-point with an opening 3-1 the right move? XG will tell you it's because it's the move that has the highest equity. If you asked an expert human player why that is the best move, though, they would talk about making home board points in order, stopping the opponent from making the Golden Point, and so on.

What I want to create is a chatbot that can give those more qualitative explanations about why a move is best.

The end state: a chatbot where you can enter a position (in a bunch of ways, including pasting a photo of a board) and ask it what the top moves are for a given dice roll. It'll tell you the standard probability and equity information that bots currently show, but it'll also give you the qualitative explanation of why the best move is the best. And similar functionality for cube decisions.

My implementation of this uses OpenAI's custom GPT framework. (To use this, I think you need to have a paid OpenAI account.) This lets you create a custom chatbot that is fine tune-trained on data you give it, and also has access to whatever code and data files you upload to it.

Then, with regular English instructions, you can tell it to, for example, call certain Python functions in response to certain types of request, or load information from a file, and things like that.

I managed to create a really simple first version that knows how to calculate the top moves, using a simple bot about the level of Benchmark 1. It also ignores the cube (for now). You can ask the chatbot questions like:

Show me an example of a backgammon board

I added a file with a list of example positions, and it'll load a random example when you ask this. It pulls it in and shows you the board.

What are the best moves for a 3-1?

It'll call the backgammon bot and get the regular bot information: moves, game probabilities, equity, and so on. It then summarizes the top three moves, describing the move and showing the probabilities. Then it tries to explain why the best move is the best, but it doesn't do a very good job, because I haven't trained that part yet. :)

Here's a link to the chatbot. Note that it might randomly not work as I play with it; and I suspect that you need a paid OpenAI account to use it.

Some thoughts after doing this really quite simple experiment:

Developing a chatbot is something you can do with English instructions. And when something goes wrong, you can ask the chatbot itself for help, and often it really does help.
It feels like the craft of building these things is just in its infancy, and I (at least!) don't know what kinds of development standards are best. For example, if you want a set of instructions for the chatbot to get set up, should they all go in one file, or is it better to use multiple files for different categories of instructions?
With OpenAI's custom GPTs, if you do a bunch of setup and successfully Update the chatbot, and then do another bit of setup and Update again, you generally lose all the information for the first setup. You need to give the custom GPT all the instructions each time you update it.
Everything runs really slowly! The chatbot often runs Python code, which takes ~30s to get set up and execute. Feels pretty sluggish. Okay for a proof of concept, but needs to be way faster for anyone to really use it.

Some stuff yet to do:

Make it easier to tell it what backgammon position you care about. Right now you need to tell it a gnubg-nn position string, which no one but three or four people in the world know about. It'd be nice to be able to paste in a photo of a board and have it parse the position out of that, but that's a pretty difficult computer vision problem. (Apparently someone has recently built an Android phone app that can do this, but I've never seen it in action.) Absent the photo parsing, I need to figure out some simple-ish way of describing a board to the chatbot.
Train it to describe why the move is best. This is the real meat of this project - will this work? I'm putting together some training data where each element is a position and dice roll, plus the best move, the checker layout after the best move, the game state probabilities after the move, and a description (created by me, to start) of why the best move is best.
Improve the bot. I was thinking of having it call gnubg, but I don't really know how to upload all the binaries and data that needs, or how to have it run as a server. So maybe I'll just use one of my old bots from earlier in this project. If this gets any traction I can figure out how to integrate with a proper bot.

Sunday, July 11, 2021

The latest GnuBG benchmark files

I found the latest versions of the GnuBG benchmark databases here. They've moved around in the 10y since I last looked for them! The file format remains the same, though I had to rebuild the parser for the 20-character string board descriptor in here.

Remember: the benchmark databases given a bunch of starting boards, rolls, and for each, the top five boards with post-roll equity for each.

There's also a set of training databases that gives a list of boards with pre-roll probabilities for the different game outcomes; that's what we do supervised learning against. Those are here. A description of the contents is here.

These training databases have grown substantially in the intervening years: there are now 1,225,297 entries for contact, 600,895 for crashed; and 516,818 for race.

Rebuilding TD-lambda learning

I tried to get TD-lambda learning to work with scikit-learn's MLPClassifier tools, but couldn't get it to accept probabilities as inputs rather than a small set of categories (1 or 0 values). Then I tried MLPRegressor, but that doesn't seem to have a nice way of making the outputs bounded in (0,1).

So rather than bang my head against that, I just rolled my own neural network again - this time in Python, but using numpy vectorized calculations to speed things up.

It's still pretty slow in execution - I can train a network with 80 hidden nodes at the pace of 20,000 games per hour on my desktop machine. But, it let me get back into the weeds with how this all works.

This time I followed Tesauro's setup a bit more closely, in that the inputs are explicitly white and black checker positions rather than "whichever player's on move", and I kept the two "whose move is it" inputs too. The outputs are: probability of white single win, probability of white gammon, probability of black gammon, probability of white backgammon, and probability of black backgammon. The probability of black single win was equal to one minus the sum of the other probabilities.

I'm able to reproduce most of the initial cubeless play results from my earlier work, though I've yet to add the inputs for the number of checks available to hit, or the Berliner primes. It takes around 200,000 game iterations to train up to something like the Benchmark 2 level. This was using alpha=0.1, lambda=0, and no "alpha splitting" (using a different learning rate for the input->hidden weights and the hidden->output weights).

So now I've convinced myself that I remember how to build one of these players from scratch. For the next step I'm going to download the latest GnuBG benchmark databases and do supervised learning on those - it should be much easier to plug that into an external package like scikit-learn.

Thursday, July 1, 2021

Gerald Tesauro original paper

For reference: here is a link to the original Tesauro paper on TD-Gammon.