Conversation
Bench: 2109681
Bench: 2109681
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
a7c8f545.nnis a(768->512)x2->1network using SCReLU activation. It has been trained from scratch, with a completely separate lineage to the master network.Before I begin, I'd like to thank the contributors that have at some point in the past been responsible for training the best Halogen network to date:
Training a network from zero for Halogen purely through self play reinforcement + supervised learning was always been a long term goal of mine. The combined effort to achieve this began in March 2024, and represents the single greatest development effort in Halogen's history.
Training began with a novel implementation of
TDLeaf(λ)reinforcement learning1. The exact lineage of best networks were:768-512x2-1_g917495.nn768-512x2-1_g917495.nntrained for 8 hours to produce768-512x2-1_e2_g1768709.nn768-512x2-1_e2_g1768709.nntrained for 13 hours to produce768-512x2-1_e3_g2065630.nnTemporal Coherence2 was then used to further the training process:
768-512x2-1_e3_g2065630.nntrained for 16 hours to produce768-512x2-1_e8_g1726579768-512x2-1_e8_g1726579trained for 16 hours to produce768-512x2-1_r12_g2297813.nnBy playing out Syzygy endgames and allowing the network to learn from those positions, endgame play was improved:
768-512x2-1_r12_g2297813.nntrained for 16 hours to produce768-512x2-1_r15_g2720489.nnBy adding 5% DFRC, the FRC performance was greatly improved:
768-512x2-1_r15_g2720489.nntrained for 16 hours to produce768-512x2-1_r17_g2894950.nnBy filtering openings that are wildly unbalanced +/- 500cp, the final
TDLeaf(λ)network was trained:768-512x2-1_r17_g2894950.nntrained for 16 hours to produce768-512x2-1_r18_g2771316.nnThe final
TDLeaf(λ)network wasElo | -182.26 +- 8.45 (95%)to compared to master.At this point I switched to supervised learning, using the bullet trainer3
bullet_r10_768-512x2-1-epoch100.bin:Bullet parameters
bullet_r17_768-512x2-1_e50.nn:Bullet parameters
Footnotes
TDLeaf(λ): Combining Temporal Difference Learning with Game-Tree Search. ↩
Temporal Coherence and Prediction Decay in TD Learning ↩
https://github.com/jw1912/bullet ↩