In the past I tried a couple extra inputs for the contact network; now I'm trying a new set of inputs for the race network.
This follows what Joseph Heled did for GNUbg's race network: instead of using a single input that represents the number of checkers borne off, using 14 separate inputs. The i'th input is 1 if the number of borne-off checkers is greater than or equal to i, and 0 otherwise. Splitting out the number into separate inputs like this is what the regular Tesauro inputs do for number of checkers on a point, and it lets the neural network discover more complex nonlinear dependencies.
Player 3.1 implements this expanded set of race inputs. Its contact network is the same as Player 3, so its Contact and Crashed ER are the same as that player (14.9 and 14.2). Its Race ER is improved: 2.08, vs 2.67 for Player 3. (Benchmark scores updated after fix to benchmark calculation.)
So a relatively small improvement, but a noticeable one worth around 0.004ppg in score.
Player 3.1 summary:
This follows what Joseph Heled did for GNUbg's race network: instead of using a single input that represents the number of checkers borne off, using 14 separate inputs. The i'th input is 1 if the number of borne-off checkers is greater than or equal to i, and 0 otherwise. Splitting out the number into separate inputs like this is what the regular Tesauro inputs do for number of checkers on a point, and it lets the neural network discover more complex nonlinear dependencies.
Player 3.1 implements this expanded set of race inputs. Its contact network is the same as Player 3, so its Contact and Crashed ER are the same as that player (14.9 and 14.2). Its Race ER is improved: 2.08, vs 2.67 for Player 3. (Benchmark scores updated after fix to benchmark calculation.)
So a relatively small improvement, but a noticeable one worth around 0.004ppg in score.
Player 3.1 summary:
- 120 hidden nodes.
- Trained from scratch (uniform random inputs in [-0.1,0.1]) using supervised learning on the Race GNUbg training database, not TD learning.
- Contact and race networks. No crashed network.
- One-sided bearoff database used when both players have all checkers in their home boards.
- Contact inputs as per Player 2.4, with Berliner prime and hitting shot inputs in addition to the original Tesauro inputs.
- Race inputs are the original Tesauro inputs plus the 14 extra inputs per player as described above.
I trained it for 110 epochs, with an alpha schedule of 1 until the 8th iteration, then 0.32 until the 20th iteration, then 0.1 until the 60th iteration, then 0.032 until the 100th iteration, then 0.01 afterward. Really it converged pretty quickly - after about 20 iterations. I let it run longer to see if would improve incrementally with smaller alpha, but it did not.
No comments:
Post a Comment