Official implementation of "Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search"
This repository contains the open-source, reproducible code for the research paper published on arXiv:2508.03661. Our Evo-MCTS framework demonstrates substantial performance improvements: a 20.2% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset and a remarkable 59.1% improvement over other LLM-based algorithm optimization frameworks.
- π Paper Information
- π Quick Start
- ποΈ Project Structure
- βοΈ Configuration
- π¬ Algorithm Overview
- π Results
- π οΈ Development Guide
- π Usage Examples
- π Portability and Deployment
- π Important Notes
- π€ Contributing
- π License
- π Acknowledgments
- π Contact
Title: Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
Authors: He Wang1,2, Liang Zeng3
1. International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, 100190, Beijing, China
2. Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, 100049, Beijing, China
3. Tsinghua University, 100084, Beijing, China
Abstract: From fundamental physics to gravitational-wave astronomy, computational scientific discovery increasingly relies on sophisticated algorithms to analyze complex datasets, yet reliable identification of gravitational-wave signals with unknown source parameters buried in dynamic detector noise remains a formidable challenge. While existing algorithmic approaches have achieved partial success, their core limitations arise from restrictive prior assumptions: traditional methods suffer from reliance on predefined theoretical priors, while neural network approaches introduce hidden biases and lack interpretability. We propose Evolutionary Monte Carlo Tree Search (Evo-MCTS), the first integration of large language model (LLM) guidance with domain-aware physical constraints to generate interpretable solutions for automated gravitational wave detection. This framework systematically explores algorithmic solution spaces through tree-structured search enhanced by evolutionary optimization. Experimental validation demonstrates substantial performance improvements, achieving a 20.2% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset and a remarkable 59.1% improvement over other LLM-based algorithm optimization frameworks. More fundamentally, our framework establishes a transferable methodology for automated algorithmic discovery across computational science domains through systematic exploration of novel algorithmic combinations.
Citation:
@article{wang2025automated,
title={Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search},
author={He Wang and Liang Zeng},
year={2025},
eprint={2508.03661},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2508.03661},
}Create a reproducible conda environment with the following commands:
# 1. Create conda environment
conda create -n env_mcts -c conda-forge python=3.12
# 2. Activate environment
conda activate env_mcts
# 3. Install dependencies
pip install hydra-core tqdm numpy joblib openai h5py gwpy pandas lalsuite
pip install "scipy==1.12.0"The run_template.py script provides a programmatic and flexible way to execute Evo-MCTS with customizable parameters:
# Basic execution with default settings
python run_template.py
# Custom model and API configuration
python run_template.py --model gpt-4 --api-key your-openai-api-key --temperature 0.8
# Using environment variables
export MODEL=gpt-4
export API_KEY=your-openai-api-key
export TEMPERATURE=0.8
python run_template.py
# Dry run to validate configuration
python run_template.py --dry-run --model gpt-4 --api-key your-keypython main.py \
problem=gw_mlgwsc1 \
llm_client.model=gpt-4 \
llm_client.api_key=your-api-key \
debug_mode=FalseEvo-MCTS/
βββ main.py # Main program entry point
βββ run_template.py # Template script for programmatic execution
βββ ahd_adapter.py # Algorithm adapter
βββ problem_adapter.py # Problem adapter
βββ .env # Environment variables (see .env.template)
βββ source/ # Core algorithm implementation
β βββ evo_mcts.py # Evolutionary Monte Carlo Tree Search
β βββ evolution.py # Evolution operations
β βββ mcts.py # Monte Carlo Tree Search
β βββ getParas.py # Parameter management
β βββ interface_LLM.py # LLM interface
βββ utils/ # Utility functions
β βββ llm_client/ # LLM client implementations
βββ cfg/ # Hydra configuration files
β βββ config.yaml # Main configuration
β βββ problem/ # Problem-specific configs
β βββ llm_client/ # LLM client configs
βββ problems/ # Problem definitions
β βββ gw_mlgwsc1/ # Gravitational wave detection problem
βββ prompts/ # LLM prompt templates
βββ results/ # Research results and paper data
βββ paper_data/ # Published paper experimental data
βββ mcts_tree_nodes_pt5_algorithm.jsonl # MCTS tree nodes for PT5 algorithm
The project supports flexible configuration through environment variables. For easy deployment and portability, we provide template files:
.env.template: Template for environment variables (replace sensitive information)run_template.py: Programmatic execution script with flexible configuration options
# LLM Configuration
MODEL=gpt-4 # Model to use
API_KEY=your-api-key-here # Your API key
BASE_URL=https://api.openai.com/v1 # API endpoint
TEMPERATURE=1.0 # Generation temperature
# Path Configuration
ML_CHALLENGE_PATH=/path/to/ml-mock-data-challenge-1 # MLGWSC-1 repository path
DATA_DIR=/path/to/generated/datasets # Generated HDF5 datasets directory
# Performance Configuration
NUMEXPR_MAX_THREADS=96 # Number of computation threads
CPU_USAGE_PERCENT=50 # CPU usage percentage
# DeepSeek Configuration (Alternative LLM)
DEEPSEEK_MODEL=deepseek-chat
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_BASE_URL=https://api.deepseek.com/v1This should point to the cloned MLGWSC-1 repository:
# Clone the MLGWSC-1 repository
git clone https://github.com/gwastro/ml-mock-data-challenge-1.git
export ML_CHALLENGE_PATH=/path/to/ml-mock-data-challenge-1This directory contains the generated HDF5 datasets for gravitational wave detection. The dataset files follow the naming pattern defined in problems/gw_mlgwsc1/gen_inst.py:
Dataset File Naming Convention:
foreground_set{SET_NUMBER}_{DURATION}dur_seed{RANDOM_SEED}_start{START_TIME}.hdf
background_set{SET_NUMBER}_{DURATION}dur_seed{RANDOM_SEED}_start{START_TIME}.hdf
injections_set{SET_NUMBER}_{DURATION}dur_seed{RANDOM_SEED}_start{START_TIME}.hdf
Example dataset files:
foreground_set4_604800dur_seed40_start0.hdfbackground_set4_604800dur_seed40_start0.hdfinjections_set4_604800dur_seed40_start0.hdf
Dataset Generation:
To generate datasets, you can use the script from this gist or refer to the parameters in gen_inst.py:
SET_NUMBER = 4(Dataset set number)DURATION = 604800(1 week in seconds)RANDOM_SEED = 40(Reproducibility seed)START_TIME = 0(Start offset)
The project uses Hydra for configuration management:
cfg/config.yaml- Main configurationcfg/problem/gw_mlgwsc1.yaml- Problem-specific configurationcfg/llm_client/openai.yaml- LLM client configuration
Our framework represents the first integration of LLM guidance with evolutionary search for gravitational wave detection:
- LLM Guidance: Large language model integration with domain-aware physical constraints
- Evolutionary Algorithm: Global search with population diversity for systematic exploration
- Monte Carlo Tree Search: Tree-structured search with intelligent exploration
- Interpretable Solutions: Generates human-interpretable algorithmic pathways
- Domain Awareness: Incorporates physical constraints specific to gravitational wave detection
- First LLM-Guided Framework: Novel integration of large language model guidance with domain-aware physical constraints
- Systematic Algorithm Space Exploration: Tree-structured search enhanced by evolutionary optimization
- Interpretable Solutions: Generates human-interpretable algorithmic pathways for automated gravitational wave detection
- Superior Performance: 20.2% improvement over state-of-the-art methods and 59.1% improvement over other LLM-based frameworks
- Novel Algorithm Discovery: Discovers previously unknown algorithmic combinations
- Transferable Methodology: Establishes a transferable approach for automated algorithmic discovery across computational science domains
- Challenge: Detect gravitational-wave signals with unknown source parameters buried in dynamic detector noise
- Input: H1 and L1 dual-channel gravitational wave data with complex noise characteristics
- Objective: Automated construction of interpretable signal detection pipelines
- Output: Catalog of candidate gravitational wave signals with enhanced detection accuracy
- Benchmark: MLGWSC-1 dataset validation
- Innovation: First framework to overcome restrictive assumptions of traditional methods while maintaining interpretability
Our Evo-MCTS framework achieves substantial performance improvements:
- 20.2% improvement over state-of-the-art gravitational wave detection algorithms on MLGWSC-1 benchmark
- 59.1% improvement over other LLM-based algorithm optimization frameworks
- Consistent performance across high-performing algorithm variants
- Superior handling of unknown source parameters in dynamic detector noise
The complete MCTS tree structure for the PT5 algorithm (node 486, fitness=5041.4) discovered during optimization is available in this repository. The tree data contains 38 algorithm nodes with comprehensive execution details:
Dataset: results/paper_data/mcts_tree_nodes_pt5_algorithm.jsonl
Data Schema (corresponding to Figure 5 in the paper):
eval_times: LLM execution sequence number (1-486)depth: MCTS tree depth level (1-10)operator: MCTS expansion type (PC/SC/PWC/PM)thinking: DeepSeek reasoning resultsreflection: DeepSeek reflection analysiscode: Generated algorithm implementationfitness: Algorithm performance score (AUC)algorithm: Post-thought algorithmic insights
Key Insights from Tree Analysis:
- Node 486: Best-performing PT5 algorithm with fitness score 5041.4
- Depth Distribution: Nodes span 10 levels (1-10) showing systematic exploration
- Operator Analysis: Different MCTS operators (PC, SC, PWC, PM) contribute to diverse algorithmic variants
- Evolution Trajectory: Complete path from initial random exploration to optimized solution
This dataset enables full reproducibility of the MCTS tree construction process and provides detailed insights into the algorithmic discovery mechanism described in the paper.
- First LLM-guided approach for gravitational wave detection with domain-aware physical constraints
- Discovery of novel algorithmic combinations previously unexplored
- Generation of human-interpretable algorithmic pathways
- Establishment of transferable methodology for computational science domains
- Create problem directory under
problems/ - Implement evaluation script
eval.py - Add configuration file in
cfg/problem/
- Implement client in
utils/llm_client/ - Add configuration in
cfg/llm_client/ - Update client initialization in
utils.py
Modify parameters in the configuration files or through environment variables:
- Population size:
pop_size - Maximum function evaluations:
max_fe - Timeout settings:
timeout - Debug mode:
debug_mode
# Basic execution with environment variables
export MODEL=gpt-4
export API_KEY=your-openai-api-key
python run_template.py
# Direct command line arguments
python run_template.py --model gpt-4 --api-key your-key --temperature 0.8
# Advanced configuration with multiple parameters
python run_template.py \
--model gpt-4 \
--api-key your-key \
--temperature 0.5 \
--cpu-usage 75 \
--numexpr-threads 64
# Using DeepSeek model
python run_template.py \
--deepseek-model deepseek-chat \
--deepseek-api-key your-deepseek-key \
--deepseek-base-url https://api.deepseek.com/v1
# Validate configuration without running
python run_template.py --dry-run --model gpt-4 --api-key your-key
# Load custom environment file
python run_template.py --env-file custom.env# Set custom timeout
python main.py problem=gw_mlgwsc1 timeout=3600 llm_client.api_key=your-key
# Enable debug mode
python main.py problem=gw_mlgwsc1 debug_mode=True llm_client.api_key=your-keyThe run_template.py script is designed for easy deployment across different environments:
-
Environment File: Create
.envfrom templatecp .env.template .env # Edit .env with your specific configuration -
Command Line Arguments: Override any setting directly
python run_template.py --model gpt-4 --api-key your-key
-
Environment Variables: Set system-wide configuration
export MODEL=gpt-4 export API_KEY=your-key python run_template.py
The script follows this priority order:
- Command line arguments (highest priority)
- Environment variables
- Default values (lowest priority)
This approach ensures:
- Security: Sensitive information can be kept in environment files
- Portability: Easy adaptation to different computing environments
- Flexibility: Multiple ways to configure the same parameter
- Reproducibility: Consistent results across different setups
- API Keys: Ensure API keys are correctly configured before running
- Data Paths:
- Set
ML_CHALLENGE_PATHto point to the cloned MLGWSC-1 repository - Set
DATA_DIRto the directory containing generated HDF5 datasets - Verify both paths exist and are accessible before execution
- Set
- Dataset Requirements: Generate or obtain the required HDF5 datasets using the naming convention in
gen_inst.py - Performance: Adjust performance parameters based on your hardware capabilities
- Resource Monitoring: Monitor computational resource usage during execution
- Reproducibility: Use the same random seeds and model versions for consistent results
We welcome contributions to improve the framework! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
We gratefully acknowledge:
- The LIGO Scientific Collaboration for gravitational wave data
- The MLGWSC-1 challenge organizers
- The open-source scientific computing community
For questions about the paper or code:
- Paper: arXiv:2508.03661
- Issues: Please use GitHub Issues for bug reports and feature requests
- Email: Contact authors through the arXiv paper
π Start your gravitational wave signal detection research journey with Evo-MCTS!