Fresh max_lcb_root experiments

Some of you old timers will remember my experiments with https://github.com/leela-zero/leela-zero/pull/883 where I simply changed move selection at the end of search to choose the move with the highest lower confidence bound based on viewing the moves as binomial distributions. It worked well in a wide variety of tests, but once --timemanage was added it didn't do much any more, presumably from the lower number of playouts being done per move. I set the experiments aside at that point.

Out of curiosity, after the Exact-MCTS discussion on increasing win rate simply from changing search, I wanted to see if this still would hold up after 10 months of code changes and a much stronger 40 block network today (with timemanage off).

I'm currently running a test of https://github.com/roy7/leela-zero/tree/max_lcb_root in gomill with this configuration:

```
competition_type = 'playoff'

players = {
    'leelaz-max_lcb_root' : Player("./leelaz-max_lcb_root-4c7e08c --logfile log.lcb -r 10 --timemanage off --visits 3200 -w 1e6c.gz --noponder --gtp "),
    'leelaz-next' : Player("./leelaz-next --logfile log.next -r 10 --timemanage off --visits 3200 -w 1e6c.gz --noponder --gtp "),
    }

board_size = 19
komi = 7.5

matchups = [
    Matchup('leelaz-max_lcb_root', 'leelaz-next', alternating=True, board_size=19, scorer='players', number_of_games=400),
    ]
```

My results so far are:

```
leelaz-max_lcb_root v leelaz-next (400/400 games)
board size: 19   komi: 7.5
                      wins              black          white        avg cpu
leelaz-max_lcb_root    247 61.75%       118 59.00%     129 64.50%    425.93
leelaz-next            153 38.25%       71  35.50%     82  41.00%    427.05
                                        189 47.25%     211 52.75%
```

~Which is looking quite good. I'll let it finish to 400 games although it'd already be an SPRT pass.~ 400 games completed.

I wonder a few things... is it working better because the latest network is so much stronger? Or maybe because batching is turned on? Or maybe because of code changes in the last 10 months?

After this ends I may try some other tests, with less playouts or an old network to compare to the results in my original PR. I could also run a test with --threads 1 or at least no batching to see if that's factoring in.

EatNow from Discord Leela Zero has [compiled my branch as a Windows binary](https://cdn.discordapp.com/attachments/417022162348802050/556133625142312999/leelaz.exe) for those interested in testing it if you can't compile yourself.

If anyone else out there wants to try some scenarios feel free to post your gomill results. Some tests with fewer playouts would also be useful. Too few and there won't be enough for the lcb logic to ever work (maybe? maybe not?), but 1600 or less might still be enough at times. (It might also work better with playout cap instead of visit cap, since that'd let the tree get much larger in forced move situations.)

@Ttl ran a CLOP that ci_alpha seems to be fine for any value of .005 of lower.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fresh max_lcb_root experiments #2282

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fresh max_lcb_root experiments #2282

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions