The first training session started with lambda=0, alpha=beta=0.5, and ran for 200,000 iterations.
I ran it for two nets: one with 20 hidden nodes, and the other with 40.
This is the simplest approach - it does not decay alpha or beta, and it does not backtrack when it finds the performance suffering versus the rolling optimum. But it is an interesting benchmark.
200,000 iterations is also not that much - it's a good start and comparable to some of the original TD training that Tesauro did, but really we want to train the nets for millions of iterations. But at least we can get some idea of how it's doing.
Every 1,000 iterations of the training the simulation runs a benchmark 200 games against the pubEval benchmark. The chart below shows the results for 40 hidden nodes. The x-axis is iteration number and the y-axis is average points per game in the 200-game benchmark matches.
Note that before training the program does terribly - losing 1.075 points per game against pubEval. Very quickly (in just a few thousand iterations) it performs roughly at pubEval level and seems to converge to around +0.25ppg, with some scatter. The best benchmark performance was +0.615ppg.
Here are the results for 20 hidden nodes:
Similar kind of results to 40 hidden nodes, really, though with more scatter toward the end of the simulation.
The frustrating thing here: while I was putting this together and getting the machinery ready to save the results, I'm pretty sure I had the 40-node network playing around 75% win rate and +0.6ppg. But I can't get it to reproduce that now! I thought I got there starting with alpha=beta=0.5 but I don't see it now. Maybe I did more steps... that said, the results above are pretty consistent with other results against pubEval that I've seen published in various articles on TD gammon. I was fiddling with the pubEval strategy, so maybe I fixed a bug there that was making it artificially weak.
Note: the results above against pubEval are invalid - I had a bug in my pubEval implementation. Fixing that bug makes it a much stronger player. Check out the Jan 2012 posts for believable results.
I ran it for two nets: one with 20 hidden nodes, and the other with 40.
This is the simplest approach - it does not decay alpha or beta, and it does not backtrack when it finds the performance suffering versus the rolling optimum. But it is an interesting benchmark.
200,000 iterations is also not that much - it's a good start and comparable to some of the original TD training that Tesauro did, but really we want to train the nets for millions of iterations. But at least we can get some idea of how it's doing.
Every 1,000 iterations of the training the simulation runs a benchmark 200 games against the pubEval benchmark. The chart below shows the results for 40 hidden nodes. The x-axis is iteration number and the y-axis is average points per game in the 200-game benchmark matches.
Note that before training the program does terribly - losing 1.075 points per game against pubEval. Very quickly (in just a few thousand iterations) it performs roughly at pubEval level and seems to converge to around +0.25ppg, with some scatter. The best benchmark performance was +0.615ppg.
Here are the results for 20 hidden nodes:
Similar kind of results to 40 hidden nodes, really, though with more scatter toward the end of the simulation.
The frustrating thing here: while I was putting this together and getting the machinery ready to save the results, I'm pretty sure I had the 40-node network playing around 75% win rate and +0.6ppg. But I can't get it to reproduce that now! I thought I got there starting with alpha=beta=0.5 but I don't see it now. Maybe I did more steps... that said, the results above are pretty consistent with other results against pubEval that I've seen published in various articles on TD gammon. I was fiddling with the pubEval strategy, so maybe I fixed a bug there that was making it artificially weak.
Note: the results above against pubEval are invalid - I had a bug in my pubEval implementation. Fixing that bug makes it a much stronger player. Check out the Jan 2012 posts for believable results.
No comments:
Post a Comment