📜 Talk to Your Slides:

Language-Driven Agents for Efficient Slide Editing

Note: TSBench-Hard version out! 📎 Download TSBench-Hard on Google Drive

Note: Batch slide inferece examples available.📎 Download Examples on Google Drive

📖 Overview

Editing presentation slides remains one of the most common and time-consuming tasks faced by millions of users daily, despite significant advances in automated slide generation.

While GUI-based agents have demonstrated visual control capabilities, they often suffer from high computational cost and latency. To address this, we propose Talk-to-Your-Slides, an LLM-powered agent that edits slides in active PowerPoint sessions by leveraging structured object-level information—bypassing the need for visual pixel interaction.

Our system introduces a hierarchical editing design, separating high-level semantic planning from low-level object manipulation. This allows:

🚀 34.02% faster execution
🎯 34.76% better instruction adherence
💸 87.42% cheaper operations

To evaluate slide editing performance, we present TSBench, a human-annotated benchmark with 379 diverse instructions spanning four major categories.

📚 TSBench Benchmark Dataset

TSBench (Original)

📎 Download TSBench on Google Drive

Our human-annotated benchmark with 379 diverse instructions spanning four major categories for evaluating slide editing performance.

TSBench-Hard

📎 Download TSBench-Hard on Google Drive

TSBench-Hard is an advanced evaluation subset designed to rigorously assess model robustness on complex real-world scenarios. This dataset contains 300 challenging instances across four key difficulty dimensions:

Visual-Dependent Tasks: Instructions requiring spatial reasoning (e.g., "Align the text box to the left edge of the image")
Ambiguous Instructions: High-level commands requiring inference (e.g., "Make the title slide look more professional")
Complex Multi-step Logic: Tasks involving conditional formatting across multiple slides (e.g., "Apply bold formatting to all titles on slides that contain a table and if you think that is important, color into red")
Impossible Tasks: Technically unfeasible requests (e.g., "Change the video content inside the embedded player") to evaluate the agent's ability to correctly identify and refuse invalid actions

Dataset Structure

Each instance in TSBench-Hard follows the structure:

{
  "instruction": "User command for slide editing task",
  "ideal_description": "Description of the ideal presentation after completing the task"
}

instruction: Generated using GPT-4.1, then filtered by human evaluators to ensure quality and challenge level
ideal_description: Describes the expected state of the presentation after successfully executing the instruction, generated by Gemini 2.5 Flash. This serves as the ground truth for evaluation
The ideal_description can be used as the evaluation ground truth to assess whether an agent's output matches the expected ideal presentation state

🎬 Demo Videos

CamelCase
Prompt: “Please update all English on ppt slides number 7 to camelCase formatting.”

Only English → Blue
Prompt: “Please change only English into blue color in slide number 3.”

Typo Checking & Correction
Prompt: “Please check ppt slides number 4 for any typos or errors, correct them.”

Translate to English
Prompt: “Please translate ppt slides number 5 into English.”

Slide Notes Script
Prompt: “Please create a full script for ppt slides number 3 and add the script to the slide notes.”

🛠️ Installation Guide

🖥️ Recommended: Python on Windows

⚠️ To allow Python to control PowerPoint via COM interface, you must enable VBA access:

Open PowerPoint
Go to File > Options > Trust Center > Trust Center Settings
In Macro Settings, check:
- ✅ "Trust access to the VBA project object model"

📦 Setup Instructions

Step 1: Install Dependencies

pip install -r requirements.txt

Note: If you encounter issues with package installation, install these core packages:

pip install openai==1.74.0 google-generativeai anthropic python-pptx Flask python-dotenv pyyaml

Step 2: Configure API Keys

Option A: Using credentials.yml (Recommended)

Copy the example credentials file:

cp credentials.yml.example credentials.yml

Edit credentials.yml with your API keys:

gpt-4.1-mini:
  api_key:  "YOUR_OPENAI_API_KEY"
  base_url: "https://api.openai.com/v1"

gpt-4.1:
  api_key:  "YOUR_OPENAI_API_KEY"
  base_url: "https://api.openai.com/v1"

gemini-1.5-flash:
  api_key: "YOUR_GEMINI_API_KEY"

claude-3.7-sonnet:
  api_key: "YOUR_ANTHROPIC_API_KEY"

Option B: Using .env file

Create a .env file in the pptagent/ directory:

cd pptagent
cat > .env << EOF
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GEMINI_API_KEY=your_gemini_key_here
EOF

Step 3: Run the System

Web UI (Flask) - Recommended for interactive use:

python pptagent/main_flask.py

Then open your browser to http://localhost:8080

CLI Mode - For batch processing:

cd pptagent
python main_cli.py

Quick Start (shows usage):

python pptagent/main.py

🔧 Project Structure

Talk-to-Your-Slides/
├── pptagent/
│   ├── main.py              # Entry point (shows usage)
│   ├── main_flask.py        # Web UI server (Flask)
│   ├── main_cli.py          # CLI interface
│   ├── classes.py           # Core PPT agent classes
│   ├── test_Applier.py      # Applier implementations
│   ├── llm_api.py           # LLM API wrappers
│   ├── gemini_api.py        # Gemini-specific API
│   ├── utils.py             # Utility functions
│   ├── prompt.py            # System prompts
│   └── templates/           # Flask HTML templates
├── credentials.yml.example  # Example API credentials
├── requirements.txt         # Python dependencies
└── README.md               # This file

🎯 Supported Models

OpenAI: GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
Google: Gemini 1.5 Flash, Gemini 2.5 Flash
Anthropic: Claude 3.7 Sonnet

💡 Usage Examples

Example 1: Translate slide content

"Translate all text content on slide 1 into Korean."

Example 2: Fix typos

"Check slide 4 for any typos or errors and correct them."

Example 3: Change formatting

"Change all English text to blue color on slide 3."

See demo videos below for more examples!

🐛 Troubleshooting

Issue: ModuleNotFoundError for openai or google.generativeai

# Solution: Install missing packages
pip install openai==1.74.0 google-generativeai

Issue: FileNotFoundError for credentials.yml

# Solution: Create credentials file from example
cp credentials.yml.example credentials.yml
# Then edit credentials.yml with your API keys

Issue: COM error on Windows

Make sure PowerPoint is installed
Enable VBA access (see installation guide above)
Run Python as Administrator if needed

Issue: Flask server not starting

# Check if port 8080 is available
# Try a different port by editing main_flask.py line 341:
# app.run(debug=True, port=8081)  # Change to different port

🏗️ Code Architecture

The system follows a hierarchical pipeline:

Planner: Analyzes user request and creates high-level plan
Parser: Parses the plan into structured tasks
Processor: Processes each task with contextual information
Applier: Applies changes to PowerPoint slides via COM/python-pptx
Reporter: Generates summary of changes made

Each component is modular and can be extended independently.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
legacy		legacy
logs		logs
pptagent		pptagent
results/baseline1		results/baseline1
test_ppts		test_ppts
.gitignore		.gitignore
README.md		README.md
credentials.yml.example		credentials.yml.example
fig1.png		fig1.png
image.png		image.png
make_edit_ppt.ipynb		make_edit_ppt.ipynb
pptx_to_vba.py		pptx_to_vba.py
requirements.txt		requirements.txt
reverse_engineering.py		reverse_engineering.py
rule_base_vba-to-python.py		rule_base_vba-to-python.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📜 Talk to Your Slides:

Language-Driven Agents for Efficient Slide Editing

📖 Overview

📚 TSBench Benchmark Dataset

TSBench (Original)

TSBench-Hard

Dataset Structure

🎬 Demo Videos

🛠️ Installation Guide

🖥️ Recommended: Python on Windows

📦 Setup Instructions

Step 1: Install Dependencies

Step 2: Configure API Keys

Step 3: Run the System

🔧 Project Structure

🎯 Supported Models

💡 Usage Examples

🐛 Troubleshooting

🏗️ Code Architecture

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

KyuDan1/Talk-to-Your-Slides

Folders and files

Latest commit

History

Repository files navigation

📜 Talk to Your Slides:

Language-Driven Agents for Efficient Slide Editing

📖 Overview

📚 TSBench Benchmark Dataset

TSBench (Original)

TSBench-Hard

Dataset Structure

🎬 Demo Videos

🛠️ Installation Guide

🖥️ Recommended: Python on Windows

📦 Setup Instructions

Step 1: Install Dependencies

Step 2: Configure API Keys

Step 3: Run the System

🔧 Project Structure

🎯 Supported Models

💡 Usage Examples

🐛 Troubleshooting

🏗️ Code Architecture

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages