A neural net player needs a backgammon framework to play in, so the first thing to do is to build that framework. I chose to do this in C++ because a) I know it, and b) it's faster in execution than Python, which is my usual choice for coding. And execution speed is important here because training a neural net can take many millions of iterations, where each iteration is a full game.
int main( int argc, char * argv [] )
{
strategyPubEval s1;
game g( &s1, &s1, 1 );
g.verbose=true;
g.stepToEnd();
}
The basic setup:
A board class
This tracks where each player's 15 checkers are on the board. It does not know how to play a step in the game or any rules. It just has a vector for checkers by position for each side; how many pieces on each side have been hit and sent to the bar; and how many have been borne in.
Importantly, it has a flag called perspective, which represents whether the board is viewed from the perspective of either player 0 or player 1. Then all the methods to get checkers etc are from the perspective of that player.
So for example there is a method hit(), which returns the number of checkers on the player's side that have been hit. If perspective is 0, that corresponds to player 0; if perspective is 1 it corresponds to player 1. There is another method otherHit() which returns the number of opponent checkers hit, which returns player 1's number if perspective is 0 and player 0's number if perspective is 1.
Or, there is a method checker( int i ), which returns the number of the player's checkers at position i (between 0 and 23), and otherChecker( int i ) which returns the number of the opponent's checkers at position i. Which player is the opponent is determined by perspective, and the indexing is reversed for perspective = 1, so checker(0) is always the number of the player's checkers at their own home position 1.
The reason this is important is because in earlier iterations of this effort I ended up with a favored perspective, and I think that may have caused subtle bugs in the code. Also it makes it easier to layout the network inputs.
Strategy classes
A strategy is something that determines which of the possible moves is the best move. The base strategy class defines an abstract virtual method boardValue, which takes in a board and returns a value - the higher the value the better for the board. The board value can represent whatever is appropriate for the strategy: the expected number of points, the probability of a win, or something more arbitrary.
Ultimately all the neural network intelligence is in a derived strategy class. But there are also simpler strategy classes - for example, a random strategy that returns a random value from boardValue, and a strategy that wraps up the standard pubEval player (which we end up using for comparisons).
When a strategy calculates its boardValue, it always uses the methods on the board that show the board from the appropriate perspective - so eg always hit() and otherHit(), never explicitly the number of checkers hit for player 0 or 1. So the board must always be passed in from the perspective of the player that is moving.
A game class
An instance of a game contains a board plus a strategy for each player. It knows how many steps a game has progressed, whether a game is over, which side won, how many points they won, and so on.
Its step() method steps one ply (one side's roll) through the game. A step rolls the dice (the random number generator is the Mersenne twister), generates all possible boards for the roll, and uses the appropriate side's strategy to determine the board value of each possible position. It then chooses the position with the maximum board value.
Before sending the board to the strategy, it sets its perspective to be that of the player which is making the move.
Another method is stepToEnd(), which keeps stepping until the game is over (one side or the other has 15 pieces borne in).
The algorithm to determine the various possible positions given a two-die roll is pretty brute-force right now. I'm not sure if there's a nicer way to do it - if so it'd have a nice performance impact since this is called for every step in every game.
The game has a boolean verbose flag. If true, it prints out the board at each step and notes the dice rolls. If false, it prints nothing. This is useful for testing whether games are behaving sensibly; you can watch a full game in all its detail with verbose=true, but then turn verbose=false (the default) when running many games.
Running a game
Some sample code for running a game is like:
{
strategyPubEval s1;
game g( &s1, &s1, 1 );
g.verbose=true;
g.stepToEnd();
}
So this constructs a pubEval strategy used for both sides in the game, printing out the board and game information at each step. The strategy is passed (as a pointer) to the game for each side, along with an integer which seeds the random number generator used for dice rolls.
No comments:
Post a Comment