Skip to content

KyuDan1/Talk-to-Your-Slides

Repository files navigation


πŸ“œ Talk to Your Slides:

Language-Driven Agents for Efficient Slide Editing

Stars


πŸ“„ Research Paper (arXiv preprint)


Note: TSBench-Hard version out! πŸ“Ž Download TSBench-Hard on Google Drive

Note: Batch slide inferece examples available.πŸ“Ž Download Examples on Google Drive


πŸ“– Overview

Editing presentation slides remains one of the most common and time-consuming tasks faced by millions of users daily, despite significant advances in automated slide generation.

While GUI-based agents have demonstrated visual control capabilities, they often suffer from high computational cost and latency. To address this, we propose Talk-to-Your-Slides, an LLM-powered agent that edits slides in active PowerPoint sessions by leveraging structured object-level informationβ€”bypassing the need for visual pixel interaction.

Our system introduces a hierarchical editing design, separating high-level semantic planning from low-level object manipulation. This allows:

  • πŸš€ 34.02% faster execution
  • 🎯 34.76% better instruction adherence
  • πŸ’Έ 87.42% cheaper operations

To evaluate slide editing performance, we present TSBench, a human-annotated benchmark with 379 diverse instructions spanning four major categories.


πŸ“š TSBench Benchmark Dataset

TSBench (Original)

πŸ“Ž Download TSBench on Google Drive

Our human-annotated benchmark with 379 diverse instructions spanning four major categories for evaluating slide editing performance.

TSBench-Hard

πŸ“Ž Download TSBench-Hard on Google Drive

TSBench-Hard is an advanced evaluation subset designed to rigorously assess model robustness on complex real-world scenarios. This dataset contains 300 challenging instances across four key difficulty dimensions:

  • Visual-Dependent Tasks: Instructions requiring spatial reasoning (e.g., "Align the text box to the left edge of the image")
  • Ambiguous Instructions: High-level commands requiring inference (e.g., "Make the title slide look more professional")
  • Complex Multi-step Logic: Tasks involving conditional formatting across multiple slides (e.g., "Apply bold formatting to all titles on slides that contain a table and if you think that is important, color into red")
  • Impossible Tasks: Technically unfeasible requests (e.g., "Change the video content inside the embedded player") to evaluate the agent's ability to correctly identify and refuse invalid actions

Dataset Structure

Each instance in TSBench-Hard follows the structure:

{
  "instruction": "User command for slide editing task",
  "ideal_description": "Description of the ideal presentation after completing the task"
}
  • instruction: Generated using GPT-4.1, then filtered by human evaluators to ensure quality and challenge level
  • ideal_description: Describes the expected state of the presentation after successfully executing the instruction, generated by Gemini 2.5 Flash. This serves as the ground truth for evaluation
  • The ideal_description can be used as the evaluation ground truth to assess whether an agent's output matches the expected ideal presentation state

🎬 Demo Videos

CamelCase Demo
CamelCase
Prompt: β€œPlease update all English on ppt slides number 7 to camelCase formatting.”

Only English β†’ Blue
Only English β†’ Blue
Prompt: β€œPlease change only English into blue color in slide number 3.”

Typo Checking Demo
Typo Checking & Correction
Prompt: β€œPlease check ppt slides number 4 for any typos or errors, correct them.”

Translate to English
Translate to English
Prompt: β€œPlease translate ppt slides number 5 into English.”

Slide‑Notes Script
Slide Notes Script
Prompt: β€œPlease create a full script for ppt slides number 3 and add the script to the slide notes.”


πŸ› οΈ Installation Guide

πŸ–₯️ Recommended: Python on Windows

⚠️ To allow Python to control PowerPoint via COM interface, you must enable VBA access:

  1. Open PowerPoint
  2. Go to File > Options > Trust Center > Trust Center Settings
  3. In Macro Settings, check:
    • βœ… "Trust access to the VBA project object model"

πŸ“¦ Setup Instructions

Step 1: Install Dependencies

pip install -r requirements.txt

Note: If you encounter issues with package installation, install these core packages:

pip install openai==1.74.0 google-generativeai anthropic python-pptx Flask python-dotenv pyyaml

Step 2: Configure API Keys

Option A: Using credentials.yml (Recommended)

Copy the example credentials file:

cp credentials.yml.example credentials.yml

Edit credentials.yml with your API keys:

gpt-4.1-mini:
  api_key:  "YOUR_OPENAI_API_KEY"
  base_url: "https://api.openai.com/v1"

gpt-4.1:
  api_key:  "YOUR_OPENAI_API_KEY"
  base_url: "https://api.openai.com/v1"

gemini-1.5-flash:
  api_key: "YOUR_GEMINI_API_KEY"

claude-3.7-sonnet:
  api_key: "YOUR_ANTHROPIC_API_KEY"

Option B: Using .env file

Create a .env file in the pptagent/ directory:

cd pptagent
cat > .env << EOF
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GEMINI_API_KEY=your_gemini_key_here
EOF

Step 3: Run the System

Web UI (Flask) - Recommended for interactive use:

python pptagent/main_flask.py

Then open your browser to http://localhost:8080

CLI Mode - For batch processing:

cd pptagent
python main_cli.py

Quick Start (shows usage):

python pptagent/main.py

πŸ”§ Project Structure

Talk-to-Your-Slides/
β”œβ”€β”€ pptagent/
β”‚   β”œβ”€β”€ main.py              # Entry point (shows usage)
β”‚   β”œβ”€β”€ main_flask.py        # Web UI server (Flask)
β”‚   β”œβ”€β”€ main_cli.py          # CLI interface
β”‚   β”œβ”€β”€ classes.py           # Core PPT agent classes
β”‚   β”œβ”€β”€ test_Applier.py      # Applier implementations
β”‚   β”œβ”€β”€ llm_api.py           # LLM API wrappers
β”‚   β”œβ”€β”€ gemini_api.py        # Gemini-specific API
β”‚   β”œβ”€β”€ utils.py             # Utility functions
β”‚   β”œβ”€β”€ prompt.py            # System prompts
β”‚   └── templates/           # Flask HTML templates
β”œβ”€β”€ credentials.yml.example  # Example API credentials
β”œβ”€β”€ requirements.txt         # Python dependencies
└── README.md               # This file

🎯 Supported Models

  • OpenAI: GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
  • Google: Gemini 1.5 Flash, Gemini 2.5 Flash
  • Anthropic: Claude 3.7 Sonnet

πŸ’‘ Usage Examples

Example 1: Translate slide content

"Translate all text content on slide 1 into Korean."

Example 2: Fix typos

"Check slide 4 for any typos or errors and correct them."

Example 3: Change formatting

"Change all English text to blue color on slide 3."

See demo videos below for more examples!

πŸ› Troubleshooting

Issue: ModuleNotFoundError for openai or google.generativeai

# Solution: Install missing packages
pip install openai==1.74.0 google-generativeai

Issue: FileNotFoundError for credentials.yml

# Solution: Create credentials file from example
cp credentials.yml.example credentials.yml
# Then edit credentials.yml with your API keys

Issue: COM error on Windows

  • Make sure PowerPoint is installed
  • Enable VBA access (see installation guide above)
  • Run Python as Administrator if needed

Issue: Flask server not starting

# Check if port 8080 is available
# Try a different port by editing main_flask.py line 341:
# app.run(debug=True, port=8081)  # Change to different port

πŸ—οΈ Code Architecture

The system follows a hierarchical pipeline:

  1. Planner: Analyzes user request and creates high-level plan
  2. Parser: Parses the plan into structured tasks
  3. Processor: Processes each task with contextual information
  4. Applier: Applies changes to PowerPoint slides via COM/python-pptx
  5. Reporter: Generates summary of changes made

Each component is modular and can be extended independently.



About

PowerPoint Editing Agent

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •