I finished 2M training runs (again with alpha=beta=0.1, no alpha/beta damping or backtracking) on the new network which has two output nodes, after correcting the bug. Here are the results:
The blue dots are the benchmark runs for the 10-hidden-node network, and the blue line is the rolling 10-point average. Red is the same for the 20-hidden-node network; green is 40 nodes; and purple is 80 nodes.
The bug ended up not making that much difference; the trained networks still perform extremely well against the pubEval benchmark.
So much so that pubEval probably isn't a great benchmark anymore. The trained networks backgammon pubEval almost a quarter of the time, which is not realistic.
Note: I discovered (much) later than I had a significant bug in my pubEval implementation. After fixing that it's a much stronger player. I also had some significant bugs in my network design that held them back. Check out posts from Jan 2012 to see proper results.
Now to look at some example games that the trained networks play against themselves to get a feel for how good they are...
The blue dots are the benchmark runs for the 10-hidden-node network, and the blue line is the rolling 10-point average. Red is the same for the 20-hidden-node network; green is 40 nodes; and purple is 80 nodes.
The bug ended up not making that much difference; the trained networks still perform extremely well against the pubEval benchmark.
So much so that pubEval probably isn't a great benchmark anymore. The trained networks backgammon pubEval almost a quarter of the time, which is not realistic.
Note: I discovered (much) later than I had a significant bug in my pubEval implementation. After fixing that it's a much stronger player. I also had some significant bugs in my network design that held them back. Check out posts from Jan 2012 to see proper results.
Now to look at some example games that the trained networks play against themselves to get a feel for how good they are...
No comments:
Post a Comment