Back
Close

Best-First Minimax Search with UCT

jacek
4,596 views

Prerequisites

I am assuming you are familiar with minimax (negamax) algorithm, preferably with alpha-beta pruning and optionally with iterative deepening, transposition tables etc. Also, you should know how Monte Carlo Tree Seach works, because the algorithm I will present mostly resembles MCTS with early playout termination.

Introduction

Recently I've been playing with neural networks to improve the evaluation function of the minimax bot. Other top bots have shown that Oware is quite NN-friendly. I've been succesful training good evaluation function (at least much better than my handcrafted one). Having a good evaluation function, one can use it in:

  • minimax, to replace the handcrafted eval
  • mcts, to guide the selection phase, or to guide the simulation phase
  • mcts, for early playout termination and return the score

For Oware, mcts with early playout termination was used by at least 2 good bots that I'm aware of. And the termination was at 0 depth, so they effectively used eval as playout's score. I too had some success with it. But by experimentation and partly by accident, I created algorithm that was better. Then of course I discovered that it was nothing new. In literature it was called Best-First Minimax Search, though my approach is a bit different.

Monte Carlo Tree Search

Let's recall 4 steps of MCTS:

  • selection: select child according to UCT: wins/visits + C * sqrt(log(parent visits) / visits), until you reach node leaf
  • expansion: expand the node leaf (either by 1 child or every possible children) and choose one of its child
  • simulation: from that child simulate the game randomly until the end of game; get the result [win, draw, lose] (1, 0, -1) or some implementations (1, 0.5, 0)
  • backpropagation: propagate the result to parents accordingly, i.e. if win was for black player, then +1 for black player's nodes and -1 for white player's nodes, up until root

And final selection: select move with highest visits; sometimes best score or lower confidence bound is used

In early playout termination, we play (semi)randomly for N moves, then use eval to propagate score. Some implementations use eval directly, other impementations may use eval > 0 = win, eval < 0 = lose.

When I used 0 depth playouts, in other words used the eval, I found that using eval directly is better than scaling it to win/lose. I've been experimenting with it and I discovered that overwriting scores instead of adding scores got me better results. I came up with "new" algorithm, but somewhat I incorporated UCT in it.

Best-First Minimax Search with UCT

It's practically the same as MCTS, with few differences:

  • selection: select child according to eval value + C * sqrt(log(parent visits) / visits), or if child is not visited: eval value + FPU, until you reach node leaf
  • expansion: expand the node leaf by every possible children (every possible moves), add eval to every children
  • "simulation": no simulation, we added eval to every children
  • backpropagation: overwrite each parent's eval in negamax style: parent's eval value = - max of children's value

Final selection: select move according to formula: move's eval value + log(move's visits)

C is exploration constant to be tuned, FPU is First Play Urgency also to be tuned. I.e. for oware i have C = 1.5, FPU = 0.5.

The final selection will mostly select move with highest visits, unless the eval score of less visited move is much higher.

I should remark I use NN for eval, which is in the range [-1,1] and the win is 10000 - 10 * game's rounds, and lose is -10000 + 10 * game's rounds (so the shortest wins and latest loses are better).

Results

With this approach, I have had great success in Oware, Breakthrough, Onitama and Othello (I use N-tuple instead of NN in the last two). Generally the winrate against my minimax bot is at least 70% using the same eval. Though I should remark, the minimax bot is rather generic alpha-beta with move ordering using transposition tables and killer moves. I didn't use any fancy pruning, quiescence search or search extensions. Still with low effort it was able to beat minimax bot convincingly. On the other hand best-first was sometimes very blind and have very misleading eval's result for move, i.e. it could give 0.9 score, just to prove the loss 1 ply later. I believe this was somewhat mitigated by using neural network.

My conclusion is: just as vanilla MCTS allows creating not-so-crappy bot without eval, the best-first minimax (with good eval) allows having better-than-alpha-beta search tree.

Future experiments

  • try shallow alpha-beta instead of eval
  • try some sort of softmax (probability distribution) instead of the UCT
  • incorporate transposition tables (or at least use them as NN cache)
Create your playground on Tech.io
This playground was created on Tech.io, our hands-on, knowledge-sharing platform for developers.
Go to tech.io