Thursday, August 4, 2011

An interesting perspective from the academy

An interesting article I found on training neural nets with different approaches - directly relevant to the approach I'm trying here.

A few observations from this:
  • Using a single neural net isn't optimal - you should use different nets for different parts of the game. For example, one net for bearing in, one for a race, and one for the rest. I've seen this discussed elsewhere as well. The article goes a bit nuts and has nine separate networks for different varieties of game position. I'll get to multiple nets eventually.
  • The optimal value for lambda they found was around 0.6. Not sure how this squares with comments from Tesauro in his later papers that lambda=0 is the best to train with. I've been using lambda=0.
  • They didn't evaluate against pubEval - they created their "expert" network (actually group of nine networks) and trained that for a while, and then used that as the benchmark. While training the expert they chose the rolling best performing version as the benchmark for subsequent training, and decided when to stop training based on when the performance had converged on that basis.
Pretty interesting. The meat of the paper, though, is on how different training methods compare (self-learning vs using the expert at each step to specify the best move vs training against expert v expert matches). The conclusion was that training against the expert's evaluation at each step was the most efficient, followed closely by self-learning, and that training by watching expert v expert games was not very efficient.

No comments:

Post a Comment