Wednesday, February 8, 2012

Evaluating a "crashed" network

Player 2.4, my best player so far, has two different neural networks for two phases of the game: race and contact.

The idea behind multiple networks is that it lets them specialize in strategies for different game phases. 

Other neural network players, for example GnuBG, or the reference network in this academic study, include more networks than just contact and race. A common one is for the "crashed" phase of the game, when the player is bearing in checkers against an opponent's anchor or blot and has to "crash" his points as the checkers come in.

UPDATE: as Ian Shaw pointed out to me, this is not actually what a crashed network is about. It's more about having most of your checkers on your 1 and 2 points, which means you have a lot less flexibility when trying to race any remaining checkers back home.

I tried training a player which is like Player 2.4 but a separate network for crashed boards. I tried this for a few different definitions of "crashed":

  • Contact, plus at least one player has all their pieces at their nine point or closer. So getting close to bearing in, but still with a risk of getting hit. This covers quite a lot of game states. Probably too many, since there is no noticeable improvement against Player 2.4 - that is, adding the crashed net does not improve performance, within about 0.007ppg standard error.
  • Contact, plus at least one player has borne off at least six checkers. So further along in the bearing off process where you are already being forced to significantly dismantle any barricade. Also: no evidence for any improvement in performance.
  • GnuBG's contact definition. This is a bit more complex, designed around making sure that any layout that starts in crashed always remains in crashed. But basically it's one where at least nine checkers have been taken off. Also: no evidence for improvement in performance.
For all three cases, training started with Player 2.4, using its contact network as the starting point for the crashed network. Training started with alpha=0.02, then dropped to 0.0063 at 100k training runs. The benchmark was 40k cubeless money games against Player 2.4. If that showed no improvement by 250k training steps then I counted it as a failure.

So despite trying a few different approaches, I see no benefit of adding a crashed network on top of the race and contact networks.

No comments:

Post a Comment