Sunday, February 12, 2012

Supervised learning using GNUbg training databases

In addition to the benchmark databases to use for evaluating a bot, the GNUbg team also constructed training databases to use for supervised learning.

There are three databases, as with the benchmarks: Contact, Crashed, and Race. Each has a list of boards and rolled-out probabilities, generated from a large sample of self-play games and real-world games sourced from FIBS.

The databases are available through FTP, and I have mirrored them at dropbox: contact, crashed, and race.

An example line from the contact file is

NIKOIJADCANIGOAHADBA 0.61959 0.24388 0.00981 0.08774 0.00183

The first element is a string representation of the board - the same one used in the benchmark databases and described in my post about those. The numbers following are probabilities: probability of any win; probability of any gammon win; probability of backgammon win; probability of any gammon loss; and probability of backgammon loss.

The probabilities assume that the player holds the dice, and were calculated from rollouts using 0-ply GNUbg.

This format is different to the benchmark databases: there, the data included the starting board and a roll, plus the equities of the top five best moves as determined by rollout. Here the data are just the board and the rolled-out probabilities - no dice roll involved.

The purpose of the training databases is to train neural network players using supervised learning instead of the traditional TD learning through self play. Supervised learning should be need a smaller training set than TD learning, and generating the positions from real play instead of self play should give a (somewhat) less biased training set.

There are 605,054 positions in the Contact database; 305,146 in Crashed; and 251,862 in Race. None of the boards in the training databases appear in the benchmark databases - the benchmarks are out of sample vs the training, as required for a proper benchmark.

The GNUbg team (in particular Joseph Heled, who was a key developer of GNUbg's neural networks) notes that TD learning was only able to generate an intermediate-strength player, and to advance beyond that required using these training databases and supervised learning. So hopefully my player will also respond well!



No comments:

Post a Comment