Computational Backgammon: The next set of results

I turned off backtracking and alpha/beta decay, since that seemed to have minimal effect in my previous experiments. I also dropped alpha and beta to 0.1 to avoid jumping around too much.

Then I let the training run for 2M iterations to see if I could get something that is plausibly converged. Honestly I don't even know if a realistic neural net does converge, but looking at other articles on TD gammon training it looks like they do.

Here are the results:

The blue line and blue points are for a network with 10 hidden nodes; green is 20; and orange is 40. The points are the average points per game playing the network against the pubEval benchmark for 200 games. The lines are the rolling 10-point averages.

Some interesting points:

All three networks improve to about the same point, about as quickly, by around 500k iterations.
The 10-hidden-node network doesn't get any better, and ends up performing at around +0.3ppg vs pubeval.
The 20-node network starts to improve again around the 1.2M iteration point, and may still be getting slowly better at the end of the 2M runs, where it plays around +0.38ppg against pubEval.
The 40-node network also picks up at around 1.2M, and gets considerably better. It still seems to be improving substantially at the end of 2M, where it plays around +0.5ppg against pubEval. Its best games are north of +0.7ppg.

This is meant to be something like the chart from the TD gammon scholarpedia page:

This goes out to 20M iterations, so is much longer than my 2M runs. But curiously, my networks all do much better than the results above. Even my 10-hidden-node network gets up to +0.3ppg on average.

There are two possibilities here I think: my pubEval player is implemented incorrectly and so is not as good as the player in the scholarpedia chart; or my trick to reduce the number of weights by (roughly) half is actually a new trick that makes convergence easier.

I hope it's the second, but really it's probably the first. So I'll go back and check out the pubEval code.

Regardless of what I find there, though, it is really satisfying to see my neural network actually learning and getting better! I'll have to play a few games against it to get an independent feel now of how well it is playing.

Note: it was the first, of course - my pubEval implementation was buggy. After fixing it's a much stronger player. I also had some significant problems with my network setup. Check out the Jan 2012 posts to see believable results.

Computational Backgammon

Monday, August 1, 2011

The next set of results

No comments:

Post a Comment

About Me