Wednesday, January 25, 2012

Checkpoint: recent progress and next steps

I've been tearing around with ideas for improving my player, but I think it's worth taking a break and summarizing recent progress and thoughts on next steps.

The big change in the last month was figuring out why my networks weren't properly converging: when calculating the equity of different possible moves, I needed to assume the opponent holds the dice (since the player's move is over), but I was assuming the player holds the dice.

Fixing that immediately improved the performance of my players, even the simple ones.

The second change worth noting is discovering that my pubEval implementation was incorrect, so the numbers I was seeing when benchmarking against it were way too good. After fixing that, my network performance dropped back down to more believable levels: comparable to the performance of players I see in some of the literature, but worse than the best bots like GnuBG.

I defined a few named players: Benchmark 1, which is a single-net player that tracks wins and gammons; Benchmark 2 which adds backgammons; and Players 2 and 2.1, which are players with separate nets for race and contact game phases (and differ just in the list of inputs), and track wins, gammons, and backgammons.

I tried out a couple of new inputs. An input tracking hitting shots seemed to add some value. I'm still testing an input for the max # of primes.

Directions for the future:

  • Finish sorting out contact inputs. The primes input seems not to work very well so far, but I suspect there's a better way to define this.
  • Improve lookahead calculations so 2-ply calculations aren't so slow.
  • Add rollouts. I've got some basic rollout functionality now but it doesn't include basic stuff like variance reduction or cutting off the rollout before the end.
  • Add custom inputs for the race network, especially adding 14 separate inputs for each checker borne off (to give the network a chance to find more complex nonlinear dependencies).
  • Extend the list of networks to include something like GnuBG's crashed network - ie for the crashed phase of the game. I looked at GnuBG's definition of "crashed" and didn't really understand it, so a bit more to do there before I turn anything on.
  • Build a position benchmark database and roll out the probabilities for each entry. That'll let me do supervised learning to train the network. GnuBG did this and those folks note that with just standard TD learning they couldn't get too far; they needed the benchmark database and supervised learning, with a focus on positions where 0- and 2-ply equities were different.
  • Hook up my bot to FIBS so I can get another benchmark on its performance.
  • Add support for cube decisions. I've entirely ignored this so far as I get the cubeless play working properly, but eventually I'll need to figure this part out. Especially challenging for me since I don't know myself much about cube strategy.

No comments:

Post a Comment