Skip to content

Update default network to a7c8f545.nn#517

Merged
KierenP merged 17 commits intomasterfrom
net_a7c8f545
Jul 15, 2024
Merged

Update default network to a7c8f545.nn#517
KierenP merged 17 commits intomasterfrom
net_a7c8f545

Conversation

@KierenP
Copy link
Copy Markdown
Owner

@KierenP KierenP commented Jul 15, 2024

a7c8f545.nn is a (768->512)x2->1 network using SCReLU activation. It has been trained from scratch, with a completely separate lineage to the master network.

Before I begin, I'd like to thank the contributors that have at some point in the past been responsible for training the best Halogen network to date:

Training a network from zero for Halogen purely through self play reinforcement + supervised learning was always been a long term goal of mine. The combined effort to achieve this began in March 2024, and represents the single greatest development effort in Halogen's history.


Training began with a novel implementation of TDLeaf(λ) reinforcement learning1. The exact lineage of best networks were:

Temporal Coherence2 was then used to further the training process:

By playing out Syzygy endgames and allowing the network to learn from those positions, endgame play was improved:

  • 11.7.0_td_leaf_learn_3.3.0 starting from 768-512x2-1_r12_g2297813.nn trained for 16 hours to produce 768-512x2-1_r15_g2720489.nn

By adding 5% DFRC, the FRC performance was greatly improved:

  • 11.7.0_td_leaf_learn_3.5.0 starting from 768-512x2-1_r15_g2720489.nn trained for 16 hours to produce 768-512x2-1_r17_g2894950.nn

By filtering openings that are wildly unbalanced +/- 500cp, the final TDLeaf(λ) network was trained:

  • 11.7.0_td_leaf_learn_3.6.0 starting from 768-512x2-1_r17_g2894950.nn trained for 16 hours to produce 768-512x2-1_r18_g2771316.nn

The final TDLeaf(λ) network was Elo | -182.26 +- 8.45 (95%) to compared to master.


At this point I switched to supervised learning, using the bullet trainer3

Bullet parameters
File Path      : ...
Threads        : 20
WDL Proportion : start 0.3 end 0.3
Max Epochs     : 100
Save Rate      : 10
Batch Size     : 16384
Net Name       : bullet_r10_768-512x2-1
LR Scheduler   : start 0.001 gamma 0.1 drop every 40 epochs
Scale          : 160
Positions      : 505179452
Bullet parameters
File Path      : ...
Threads        : 20
WDL Proportion : start 0.3 end 0.3
Max Epochs     : 50
Save Rate      : 10
Batch Size     : 16384
Net Name       : bullet_r17_768-512x2-1
LR Scheduler   : start 0.001 gamma 0.95 drop every 1 epochs
Scale          : 160
Positions      : 1116214793

Elo   | 31.56 +- 5.70 (95%)
Conf  | 40.0+0.40s Threads=1 Hash=64MB
Games | N: 5000 W: 1530 L: 1077 D: 2393
Penta | [29, 473, 1126, 760, 112]
http://chess.grantnet.us/test/37549/
Elo   | 30.92 +- 6.37 (95%)
Conf  | 8.0+0.08s Threads=1 Hash=8MB
Games | N: 5002 W: 1626 L: 1182 D: 2194
Penta | [73, 504, 997, 760, 167]
http://chess.grantnet.us/test/37548/

Footnotes

  1. TDLeaf(λ): Combining Temporal Difference Learning with Game-Tree Search.

  2. Temporal Coherence and Prediction Decay in TD Learning

  3. https://github.com/jw1912/bullet

@KierenP KierenP merged commit 2f8e9d3 into master Jul 15, 2024
@KierenP KierenP deleted the net_a7c8f545 branch July 15, 2024 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant