Sunday, July 31, 2011

Second training approach

The next step in improving the learning was to include backtracking and alpha/beta decay.

The idea: if the network drifts into a suboptimal part of parameter space because the learning pace (via alpha and beta) is too great, step back to where it was good and reduce alpha and beta.

The algorithm in practice is:
  • Every 1,000 learning iterations, run the same 200 test games against the pubEval benchmark.
  • Keep track of the best performance (ie highest win fraction, since that's what the net is optimizing).
  • If a subsequent set of 200 test games gives a win fraction less than the rolling maximum less a threshold, and alpha and beta are above specified minimum levels, step back to the weights & eligibility traces at that maximum point, and reduce alpha and beta by a fixed fraction.
  • If alpha and beta are less than the minimum after being reduced, set them to the minimum.
In principle this should give better learning behavior than without the backtracking.

The results for 40 hidden nodes:

and for 20 hidden nodes:

Not a whole lot of difference between 20 and 40 nodes here, and it seems pretty converged. Compared to the previous results without backtracking, there's less scatter, but it doesn't seem to actually cause it to improve in outright performance appreciably.

Note: the results above are invalid because I had a bug in my pubEval implementation. Fixing that made the pubEval player much strong. Check out Jan 2012 posts for believable results.

No comments:

Post a Comment