Skip to content

Official implementation of "Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search" (arXiv:2508.03661).

License

Notifications You must be signed in to change notification settings

iphysresearch/evo-mcts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Evo-MCTS: Evolutionary Monte Carlo Tree Search for Gravitational Wave Signal Detection

arXiv Python 3.12 License GitHub stars GitHub forks

Official implementation of "Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search"

This repository contains the open-source, reproducible code for the research paper published on arXiv:2508.03661. Our Evo-MCTS framework demonstrates substantial performance improvements: a 20.2% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset and a remarkable 59.1% improvement over other LLM-based algorithm optimization frameworks.

πŸ“‹ Table of Contents

πŸ“„ Paper Information

Title: Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search

Authors: He Wang1,2, Liang Zeng3

1. International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, 100190, Beijing, China
2. Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, 100049, Beijing, China
3. Tsinghua University, 100084, Beijing, China

Abstract: From fundamental physics to gravitational-wave astronomy, computational scientific discovery increasingly relies on sophisticated algorithms to analyze complex datasets, yet reliable identification of gravitational-wave signals with unknown source parameters buried in dynamic detector noise remains a formidable challenge. While existing algorithmic approaches have achieved partial success, their core limitations arise from restrictive prior assumptions: traditional methods suffer from reliance on predefined theoretical priors, while neural network approaches introduce hidden biases and lack interpretability. We propose Evolutionary Monte Carlo Tree Search (Evo-MCTS), the first integration of large language model (LLM) guidance with domain-aware physical constraints to generate interpretable solutions for automated gravitational wave detection. This framework systematically explores algorithmic solution spaces through tree-structured search enhanced by evolutionary optimization. Experimental validation demonstrates substantial performance improvements, achieving a 20.2% improvement over state-of-the-art gravitational wave detection algorithms on the MLGWSC-1 benchmark dataset and a remarkable 59.1% improvement over other LLM-based algorithm optimization frameworks. More fundamentally, our framework establishes a transferable methodology for automated algorithmic discovery across computational science domains through systematic exploration of novel algorithmic combinations.

Citation:

@article{wang2025automated,
      title={Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search}, 
      author={He Wang and Liang Zeng},
      year={2025},
      eprint={2508.03661},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.03661}, 
}

πŸš€ Quick Start

Environment Setup

Create a reproducible conda environment with the following commands:

# 1. Create conda environment
conda create -n env_mcts -c conda-forge python=3.12

# 2. Activate environment
conda activate env_mcts

# 3. Install dependencies
pip install hydra-core tqdm numpy joblib openai h5py gwpy pandas lalsuite
pip install "scipy==1.12.0"

Running the Code

Using run_template.py (Recommended)

The run_template.py script provides a programmatic and flexible way to execute Evo-MCTS with customizable parameters:

# Basic execution with default settings
python run_template.py

# Custom model and API configuration
python run_template.py --model gpt-4 --api-key your-openai-api-key --temperature 0.8

# Using environment variables
export MODEL=gpt-4
export API_KEY=your-openai-api-key
export TEMPERATURE=0.8
python run_template.py

# Dry run to validate configuration
python run_template.py --dry-run --model gpt-4 --api-key your-key

Direct Python Execution

python main.py \
  problem=gw_mlgwsc1 \
  llm_client.model=gpt-4 \
  llm_client.api_key=your-api-key \
  debug_mode=False

πŸ—οΈ Project Structure

Evo-MCTS/
β”œβ”€β”€ main.py                 # Main program entry point
β”œβ”€β”€ run_template.py         # Template script for programmatic execution
β”œβ”€β”€ ahd_adapter.py          # Algorithm adapter
β”œβ”€β”€ problem_adapter.py      # Problem adapter
β”œβ”€β”€ .env                    # Environment variables (see .env.template)
β”œβ”€β”€ source/                 # Core algorithm implementation
β”‚   β”œβ”€β”€ evo_mcts.py        # Evolutionary Monte Carlo Tree Search
β”‚   β”œβ”€β”€ evolution.py       # Evolution operations
β”‚   β”œβ”€β”€ mcts.py           # Monte Carlo Tree Search
β”‚   β”œβ”€β”€ getParas.py       # Parameter management
β”‚   └── interface_LLM.py  # LLM interface
β”œβ”€β”€ utils/                 # Utility functions
β”‚   └── llm_client/       # LLM client implementations
β”œβ”€β”€ cfg/                  # Hydra configuration files
β”‚   β”œβ”€β”€ config.yaml       # Main configuration
β”‚   β”œβ”€β”€ problem/          # Problem-specific configs
β”‚   └── llm_client/       # LLM client configs
β”œβ”€β”€ problems/             # Problem definitions
β”‚   └── gw_mlgwsc1/      # Gravitational wave detection problem
β”œβ”€β”€ prompts/             # LLM prompt templates
└── results/             # Research results and paper data
    └── paper_data/      # Published paper experimental data
        └── mcts_tree_nodes_pt5_algorithm.jsonl  # MCTS tree nodes for PT5 algorithm

βš™οΈ Configuration

Environment Variables

The project supports flexible configuration through environment variables. For easy deployment and portability, we provide template files:

Configuration Templates

  • .env.template: Template for environment variables (replace sensitive information)
  • run_template.py: Programmatic execution script with flexible configuration options
# LLM Configuration
MODEL=gpt-4                              # Model to use
API_KEY=your-api-key-here               # Your API key
BASE_URL=https://api.openai.com/v1      # API endpoint
TEMPERATURE=1.0                         # Generation temperature

# Path Configuration
ML_CHALLENGE_PATH=/path/to/ml-mock-data-challenge-1  # MLGWSC-1 repository path
DATA_DIR=/path/to/generated/datasets                 # Generated HDF5 datasets directory

# Performance Configuration
NUMEXPR_MAX_THREADS=96                  # Number of computation threads
CPU_USAGE_PERCENT=50                    # CPU usage percentage

# DeepSeek Configuration (Alternative LLM)
DEEPSEEK_MODEL=deepseek-chat
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_BASE_URL=https://api.deepseek.com/v1

Data Configuration

ML_CHALLENGE_PATH

This should point to the cloned MLGWSC-1 repository:

# Clone the MLGWSC-1 repository
git clone https://github.com/gwastro/ml-mock-data-challenge-1.git
export ML_CHALLENGE_PATH=/path/to/ml-mock-data-challenge-1

DATA_DIR

This directory contains the generated HDF5 datasets for gravitational wave detection. The dataset files follow the naming pattern defined in problems/gw_mlgwsc1/gen_inst.py:

Dataset File Naming Convention:

foreground_set{SET_NUMBER}_{DURATION}dur_seed{RANDOM_SEED}_start{START_TIME}.hdf
background_set{SET_NUMBER}_{DURATION}dur_seed{RANDOM_SEED}_start{START_TIME}.hdf
injections_set{SET_NUMBER}_{DURATION}dur_seed{RANDOM_SEED}_start{START_TIME}.hdf

Example dataset files:

  • foreground_set4_604800dur_seed40_start0.hdf
  • background_set4_604800dur_seed40_start0.hdf
  • injections_set4_604800dur_seed40_start0.hdf

Dataset Generation: To generate datasets, you can use the script from this gist or refer to the parameters in gen_inst.py:

  • SET_NUMBER = 4 (Dataset set number)
  • DURATION = 604800 (1 week in seconds)
  • RANDOM_SEED = 40 (Reproducibility seed)
  • START_TIME = 0 (Start offset)

Hydra Configuration

The project uses Hydra for configuration management:

  • cfg/config.yaml - Main configuration
  • cfg/problem/gw_mlgwsc1.yaml - Problem-specific configuration
  • cfg/llm_client/openai.yaml - LLM client configuration

πŸ”¬ Algorithm Overview

Evo-MCTS Framework

Our framework represents the first integration of LLM guidance with evolutionary search for gravitational wave detection:

  • LLM Guidance: Large language model integration with domain-aware physical constraints
  • Evolutionary Algorithm: Global search with population diversity for systematic exploration
  • Monte Carlo Tree Search: Tree-structured search with intelligent exploration
  • Interpretable Solutions: Generates human-interpretable algorithmic pathways
  • Domain Awareness: Incorporates physical constraints specific to gravitational wave detection

Key Features

  1. First LLM-Guided Framework: Novel integration of large language model guidance with domain-aware physical constraints
  2. Systematic Algorithm Space Exploration: Tree-structured search enhanced by evolutionary optimization
  3. Interpretable Solutions: Generates human-interpretable algorithmic pathways for automated gravitational wave detection
  4. Superior Performance: 20.2% improvement over state-of-the-art methods and 59.1% improvement over other LLM-based frameworks
  5. Novel Algorithm Discovery: Discovers previously unknown algorithmic combinations
  6. Transferable Methodology: Establishes a transferable approach for automated algorithmic discovery across computational science domains

Gravitational Wave Detection Challenge

  • Challenge: Detect gravitational-wave signals with unknown source parameters buried in dynamic detector noise
  • Input: H1 and L1 dual-channel gravitational wave data with complex noise characteristics
  • Objective: Automated construction of interpretable signal detection pipelines
  • Output: Catalog of candidate gravitational wave signals with enhanced detection accuracy
  • Benchmark: MLGWSC-1 dataset validation
  • Innovation: First framework to overcome restrictive assumptions of traditional methods while maintaining interpretability

πŸ“Š Results

Our Evo-MCTS framework achieves substantial performance improvements:

Performance Benchmarks

  • 20.2% improvement over state-of-the-art gravitational wave detection algorithms on MLGWSC-1 benchmark
  • 59.1% improvement over other LLM-based algorithm optimization frameworks
  • Consistent performance across high-performing algorithm variants
  • Superior handling of unknown source parameters in dynamic detector noise

MCTS Tree Analysis and Reproducible Data

The complete MCTS tree structure for the PT5 algorithm (node 486, fitness=5041.4) discovered during optimization is available in this repository. The tree data contains 38 algorithm nodes with comprehensive execution details:

Dataset: results/paper_data/mcts_tree_nodes_pt5_algorithm.jsonl

Data Schema (corresponding to Figure 5 in the paper):

  • eval_times: LLM execution sequence number (1-486)
  • depth: MCTS tree depth level (1-10)
  • operator: MCTS expansion type (PC/SC/PWC/PM)
  • thinking: DeepSeek reasoning results
  • reflection: DeepSeek reflection analysis
  • code: Generated algorithm implementation
  • fitness: Algorithm performance score (AUC)
  • algorithm: Post-thought algorithmic insights

Key Insights from Tree Analysis:

  • Node 486: Best-performing PT5 algorithm with fitness score 5041.4
  • Depth Distribution: Nodes span 10 levels (1-10) showing systematic exploration
  • Operator Analysis: Different MCTS operators (PC, SC, PWC, PM) contribute to diverse algorithmic variants
  • Evolution Trajectory: Complete path from initial random exploration to optimized solution

This dataset enables full reproducibility of the MCTS tree construction process and provides detailed insights into the algorithmic discovery mechanism described in the paper.

Scientific Contributions

  • First LLM-guided approach for gravitational wave detection with domain-aware physical constraints
  • Discovery of novel algorithmic combinations previously unexplored
  • Generation of human-interpretable algorithmic pathways
  • Establishment of transferable methodology for computational science domains

πŸ› οΈ Development Guide

Adding New Problems

  1. Create problem directory under problems/
  2. Implement evaluation script eval.py
  3. Add configuration file in cfg/problem/

Integrating New LLMs

  1. Implement client in utils/llm_client/
  2. Add configuration in cfg/llm_client/
  3. Update client initialization in utils.py

Customizing Evolution Parameters

Modify parameters in the configuration files or through environment variables:

  • Population size: pop_size
  • Maximum function evaluations: max_fe
  • Timeout settings: timeout
  • Debug mode: debug_mode

πŸ“ˆ Usage Examples

Using run_template.py

# Basic execution with environment variables
export MODEL=gpt-4
export API_KEY=your-openai-api-key
python run_template.py

# Direct command line arguments
python run_template.py --model gpt-4 --api-key your-key --temperature 0.8

# Advanced configuration with multiple parameters
python run_template.py \
  --model gpt-4 \
  --api-key your-key \
  --temperature 0.5 \
  --cpu-usage 75 \
  --numexpr-threads 64

# Using DeepSeek model
python run_template.py \
  --deepseek-model deepseek-chat \
  --deepseek-api-key your-deepseek-key \
  --deepseek-base-url https://api.deepseek.com/v1

# Validate configuration without running
python run_template.py --dry-run --model gpt-4 --api-key your-key

# Load custom environment file
python run_template.py --env-file custom.env

Direct Python Execution

# Set custom timeout
python main.py problem=gw_mlgwsc1 timeout=3600 llm_client.api_key=your-key

# Enable debug mode
python main.py problem=gw_mlgwsc1 debug_mode=True llm_client.api_key=your-key

πŸ”„ Portability and Deployment

The run_template.py script is designed for easy deployment across different environments:

Configuration Management

  1. Environment File: Create .env from template

    cp .env.template .env
    # Edit .env with your specific configuration
  2. Command Line Arguments: Override any setting directly

    python run_template.py --model gpt-4 --api-key your-key
  3. Environment Variables: Set system-wide configuration

    export MODEL=gpt-4
    export API_KEY=your-key
    python run_template.py

Configuration Precedence

The script follows this priority order:

  1. Command line arguments (highest priority)
  2. Environment variables
  3. Default values (lowest priority)

This approach ensures:

  • Security: Sensitive information can be kept in environment files
  • Portability: Easy adaptation to different computing environments
  • Flexibility: Multiple ways to configure the same parameter
  • Reproducibility: Consistent results across different setups

πŸ“ Important Notes

  • API Keys: Ensure API keys are correctly configured before running
  • Data Paths:
    • Set ML_CHALLENGE_PATH to point to the cloned MLGWSC-1 repository
    • Set DATA_DIR to the directory containing generated HDF5 datasets
    • Verify both paths exist and are accessible before execution
  • Dataset Requirements: Generate or obtain the required HDF5 datasets using the naming convention in gen_inst.py
  • Performance: Adjust performance parameters based on your hardware capabilities
  • Resource Monitoring: Monitor computational resource usage during execution
  • Reproducibility: Use the same random seeds and model versions for consistent results

🀝 Contributing

We welcome contributions to improve the framework! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

πŸ™ Acknowledgments

We gratefully acknowledge:

  • The LIGO Scientific Collaboration for gravitational wave data
  • The MLGWSC-1 challenge organizers
  • The open-source scientific computing community

πŸ“ž Contact

For questions about the paper or code:

  • Paper: arXiv:2508.03661
  • Issues: Please use GitHub Issues for bug reports and feature requests
  • Email: Contact authors through the arXiv paper

🌊 Start your gravitational wave signal detection research journey with Evo-MCTS!