Monday, May 14, 2012

Player 3.5: new input, escapes from the bar

I tried another new input for contact and crashed networks: this time, the expected escapes if you have a single checker on the bar. That is, looking at the available spaces in the opponent home board and weighting the probability of landing in the space with the standard escape count from the Berliner primes calculation. It is meant to give some indication of how good or bad it'd be to get hit. I'm focusing on inputs along these lines because when looking at which positions are calculated most poorly in the benchmarks, it tends to be boards where there is a significant chance of being hit and landing behind a prime.

This one had some success, and while the improvement is still incremental, it resulted in my best player to date. The resulting player that uses the new input is Player 3.5. It is identical to Player 3.4, except for two new inputs: the input as described above, one for each player.

Its GNUbg benchmark scores are Contact 13.0, Crashed 11.5, and Race 0.766. Player 3.4's scores are 13.3, 11.7, and 0.766, so noticeably better but still nothing dramatic (though notably some improvement in Contact, the most important benchmark). It seems that to get a significantly stronger player I'll have to add a bunch of inputs, each of which offers reasonably incremental benefits.

In cubeless money player against Player 3.4, it scores an average +0.0033ppg +/- 0.0021ppg in 400k games. Against PubEval it scores an average +0.592ppg +/- 0.005ppg in 100k games and wins 69.5% of the games.

Still not nearly as good as GNUbg 0-ply! But creeping closer.

To be honest I'm not really sure whether the improved performance came because of the new input or because I slightly changed the training algorithm. In this case I started with random weights for the new inputs and ran supervised learning against the GNUbg training databases (contact & crashed). And instead of bouncing back and forth between a large alpha (1) and smaller alphas, I just used a small and constant alpha of 0.03. The resulting benchmark score slowly improved over 1,100 iterations, which took several days to run.