I tried to get TD-lambda learning to work with scikit-learn's MLPClassifier tools, but couldn't get it to accept probabilities as inputs rather than a small set of categories (1 or 0 values). Then I tried MLPRegressor, but that doesn't seem to have a nice way of making the outputs bounded in (0,1).
So rather than bang my head against that, I just rolled my own neural network again - this time in Python, but using numpy vectorized calculations to speed things up.
It's still pretty slow in execution - I can train a network with 80 hidden nodes at the pace of 20,000 games per hour on my desktop machine. But, it let me get back into the weeds with how this all works.
This time I followed Tesauro's setup a bit more closely, in that the inputs are explicitly white and black checker positions rather than "whichever player's on move", and I kept the two "whose move is it" inputs too. The outputs are: probability of white single win, probability of white gammon, probability of black gammon, probability of white backgammon, and probability of black backgammon. The probability of black single win was equal to one minus the sum of the other probabilities.
I'm able to reproduce most of the initial cubeless play results from my earlier work, though I've yet to add the inputs for the number of checks available to hit, or the Berliner primes. It takes around 200,000 game iterations to train up to something like the Benchmark 2 level. This was using alpha=0.1, lambda=0, and no "alpha splitting" (using a different learning rate for the input->hidden weights and the hidden->output weights).
So now I've convinced myself that I remember how to build one of these players from scratch. For the next step I'm going to download the latest GnuBG benchmark databases and do supervised learning on those - it should be much easier to plug that into an external package like scikit-learn.