Since my last checkpoint around 2m ago I've made quite a lot of progress.
Main highlights:
Main highlights:
- Networks: I added a crashed network following the GNUbg definition of "crashed".
- Network inputs: I added a new input to the contact and crashed networks that measures strength of prime blocking. I also extended the race inputs to more granularly track the checkers born in.
- Training: I grabbed the GNUbg training databases for contact, crashed, and race networks, and ran supervised learning against them. The best performance is first trained using TD learning, then uses those network weights as the starting point for supervised learning.
- Benchmarking: I moved from bot self-play performance to benchmarking against the GNUbg benchmark databases for cubeless play, since that gives a more accurate and robust set of benchmarks for different game phases. I ran some stats to look at the relationship between self-play results and GNU benchmark scores.
- Cube handling: this was a new area for me since my personal game is weak on cube play. I learned about Janowski's classic model for money games, as well as how to construct match equity tables for tournament play.
- New model for money game cube action: I developed a new model for cube decisions in money games, moving from the "live cube" limit where probability of win is assumed to diffuse, to a jump model where the probability of win jumps each turn.
Along the way I developed several new players incorporating the new ideas. As of now my best checker player is Player 3.3, which scores GNUbg benchmark scores of Contact 13.3, Crashed 12.0, and Race 0.93 (compared to GNUbg 0-ply scores 10.5, 11.0, and 1.01).
Stuff still to do:
- Money play: implement my new model properly and test out its performance. Hopefully it should perform better than the standard approaches.
- Match play: implement matches properly in my backgammon framework and see if I can extend my new model approach to match play as well (coming up with better match equity tables along the way).
- Checker play: my players' Contact benchmark scores are still pretty weak compared to GNUbg. I suspect I need to add more custom inputs. I also want to investigate some more advanced supervised learning techniques applied to training against the GNUbg training databases.
- Rollouts: rollouts are still pretty basic in my framework; I need to properly implement variance reduction and cutting off rollouts before the end of the game.
- FIBS hookup: I still want to hook up my bot to FIBS and see how it performs there in real-world games.
- Networks: I've been talking with Øystein Johansen about maybe breaking up the "contact" specification into separate nets for different game phases, perhaps using k-mean clustering to define the phases.
Fun stuff!
No comments:
Post a Comment