Skip to content

NVlabs/puzzle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demo code for Puzzle: Distillation-Based NAS for Inference-Optimized LLMs

ICML 2025 Paper Video

A demonstration of Puzzle, a hardware-aware framework that accelerates Large Language Model (LLM) inference while preserving model capabilities through neural architecture search (NAS) and knowledge distillation.

Note: This repo focuses on the MIP-based architecture search, and is not a full release of Puzzle. We are working to release a full end-to-end implementation in the future which would contain the full capabilities of Puzzle, including block library construction and Blockwise Local Distillation.

Paper

Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
ICML 2025

Interactive Demo

The best way to understand Puzzle is through our Jupyter notebook demonstration:

jupyter notebook examples/puzzle_demonstration.ipynb

puzzle_demonstration.ipynb - This notebook provides:

  • Complete walkthrough of Puzzle's MIP-based architecture search
  • Real examples using Llama 3.3-70B-Instruct block library data
  • Visualization of the trade-off between accuracy and runtime/memory
  • Multiple deployment scenarios (H100 vs Edge devices)
  • Interactive exploration of the search space

Installation

Create a fresh python environment (recommended: python=3.12) using your favorite package manager and install the dependencies in requirements.txt.

Overview

Puzzle consists of three stages:

  1. Block Library Construction - Train alternative block variants with Blockwise Local Distillation (BLD)
  2. MIP-based Architecture Search - Find optimal configuration for target hardware (this repo)
  3. Global Knowledge Distillation - Fine-tune the assembled architecture

This repository implements Stage 2, demonstrating how to:

  • Load pre-computed block libraries with accuracy/resource measurements
  • Define hardware constraints (memory, latency requirements)
  • Search for optimal architectures using Mixed-Integer Programming
  • Visualize and compare different deployment scenarios

Repo Structure

puzzle/
├── puzzle/
│   └── mip_nas.py                       # Core MIP-based NAS implementation
├── examples/
│   ├── puzzle_demonstration.ipynb       # Main demo
│   ├── data/                            # Pre-computed block library data
│   │    └── Llama-3.3-70B-Instruct/
│   │        ├── block_library.json
│   │        ├── measurement_info.json
│   │        └── parent_block_stats.json
│   └── standalone_mip_nas_example.py    # Standalone MIP NAS example script
└── requirements.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages