A demonstration of Puzzle, a hardware-aware framework that accelerates Large Language Model (LLM) inference while preserving model capabilities through neural architecture search (NAS) and knowledge distillation.
Note: This repo focuses on the MIP-based architecture search, and is not a full release of Puzzle. We are working to release a full end-to-end implementation in the future which would contain the full capabilities of Puzzle, including block library construction and Blockwise Local Distillation.
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
ICML 2025
The best way to understand Puzzle is through our Jupyter notebook demonstration:
jupyter notebook examples/puzzle_demonstration.ipynbpuzzle_demonstration.ipynb - This notebook provides:
- Complete walkthrough of Puzzle's MIP-based architecture search
- Real examples using Llama 3.3-70B-Instruct block library data
- Visualization of the trade-off between accuracy and runtime/memory
- Multiple deployment scenarios (H100 vs Edge devices)
- Interactive exploration of the search space
Create a fresh python environment (recommended: python=3.12) using your favorite package manager and install the dependencies in requirements.txt.
Puzzle consists of three stages:
- Block Library Construction - Train alternative block variants with Blockwise Local Distillation (BLD)
- MIP-based Architecture Search - Find optimal configuration for target hardware (this repo)
- Global Knowledge Distillation - Fine-tune the assembled architecture
This repository implements Stage 2, demonstrating how to:
- Load pre-computed block libraries with accuracy/resource measurements
- Define hardware constraints (memory, latency requirements)
- Search for optimal architectures using Mixed-Integer Programming
- Visualize and compare different deployment scenarios
puzzle/
├── puzzle/
│ └── mip_nas.py # Core MIP-based NAS implementation
├── examples/
│ ├── puzzle_demonstration.ipynb # Main demo
│ ├── data/ # Pre-computed block library data
│ │ └── Llama-3.3-70B-Instruct/
│ │ ├── block_library.json
│ │ ├── measurement_info.json
│ │ └── parent_block_stats.json
│ └── standalone_mip_nas_example.py # Standalone MIP NAS example script
└── requirements.txt