Note: TSBench-Hard version out! π Download TSBench-Hard on Google Drive
Note: Batch slide inferece examples available.π Download Examples on Google Drive
Editing presentation slides remains one of the most common and time-consuming tasks faced by millions of users daily, despite significant advances in automated slide generation.
While GUI-based agents have demonstrated visual control capabilities, they often suffer from high computational cost and latency. To address this, we propose Talk-to-Your-Slides, an LLM-powered agent that edits slides in active PowerPoint sessions by leveraging structured object-level informationβbypassing the need for visual pixel interaction.
Our system introduces a hierarchical editing design, separating high-level semantic planning from low-level object manipulation. This allows:
- π 34.02% faster execution
- π― 34.76% better instruction adherence
- πΈ 87.42% cheaper operations
To evaluate slide editing performance, we present TSBench, a human-annotated benchmark with 379 diverse instructions spanning four major categories.
π Download TSBench on Google Drive
Our human-annotated benchmark with 379 diverse instructions spanning four major categories for evaluating slide editing performance.
π Download TSBench-Hard on Google Drive
TSBench-Hard is an advanced evaluation subset designed to rigorously assess model robustness on complex real-world scenarios. This dataset contains 300 challenging instances across four key difficulty dimensions:
- Visual-Dependent Tasks: Instructions requiring spatial reasoning (e.g., "Align the text box to the left edge of the image")
- Ambiguous Instructions: High-level commands requiring inference (e.g., "Make the title slide look more professional")
- Complex Multi-step Logic: Tasks involving conditional formatting across multiple slides (e.g., "Apply bold formatting to all titles on slides that contain a table and if you think that is important, color into red")
- Impossible Tasks: Technically unfeasible requests (e.g., "Change the video content inside the embedded player") to evaluate the agent's ability to correctly identify and refuse invalid actions
Each instance in TSBench-Hard follows the structure:
{
"instruction": "User command for slide editing task",
"ideal_description": "Description of the ideal presentation after completing the task"
}instruction: Generated using GPT-4.1, then filtered by human evaluators to ensure quality and challenge levelideal_description: Describes the expected state of the presentation after successfully executing the instruction, generated by Gemini 2.5 Flash. This serves as the ground truth for evaluation- The
ideal_descriptioncan be used as the evaluation ground truth to assess whether an agent's output matches the expected ideal presentation state
CamelCase
Prompt: βPlease update all English on ppt slides number 7 to camelCase formatting.β
Only English β Blue
Prompt: βPlease change only English into blue color in slide number 3.β
Typo Checking & Correction
Prompt: βPlease check ppt slides number 4 for any typos or errors, correct them.β
Translate to English
Prompt: βPlease translate ppt slides number 5 into English.β
Slide Notes Script
Prompt: βPlease create a full script for ppt slides number 3 and add the script to the slide notes.β
- Open PowerPoint
- Go to File > Options > Trust Center > Trust Center Settings
- In Macro Settings, check:
- β "Trust access to the VBA project object model"
pip install -r requirements.txtNote: If you encounter issues with package installation, install these core packages:
pip install openai==1.74.0 google-generativeai anthropic python-pptx Flask python-dotenv pyyamlOption A: Using credentials.yml (Recommended)
Copy the example credentials file:
cp credentials.yml.example credentials.ymlEdit credentials.yml with your API keys:
gpt-4.1-mini:
api_key: "YOUR_OPENAI_API_KEY"
base_url: "https://api.openai.com/v1"
gpt-4.1:
api_key: "YOUR_OPENAI_API_KEY"
base_url: "https://api.openai.com/v1"
gemini-1.5-flash:
api_key: "YOUR_GEMINI_API_KEY"
claude-3.7-sonnet:
api_key: "YOUR_ANTHROPIC_API_KEY"Option B: Using .env file
Create a .env file in the pptagent/ directory:
cd pptagent
cat > .env << EOF
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GEMINI_API_KEY=your_gemini_key_here
EOFWeb UI (Flask) - Recommended for interactive use:
python pptagent/main_flask.pyThen open your browser to http://localhost:8080
CLI Mode - For batch processing:
cd pptagent
python main_cli.pyQuick Start (shows usage):
python pptagent/main.pyTalk-to-Your-Slides/
βββ pptagent/
β βββ main.py # Entry point (shows usage)
β βββ main_flask.py # Web UI server (Flask)
β βββ main_cli.py # CLI interface
β βββ classes.py # Core PPT agent classes
β βββ test_Applier.py # Applier implementations
β βββ llm_api.py # LLM API wrappers
β βββ gemini_api.py # Gemini-specific API
β βββ utils.py # Utility functions
β βββ prompt.py # System prompts
β βββ templates/ # Flask HTML templates
βββ credentials.yml.example # Example API credentials
βββ requirements.txt # Python dependencies
βββ README.md # This file
- OpenAI: GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
- Google: Gemini 1.5 Flash, Gemini 2.5 Flash
- Anthropic: Claude 3.7 Sonnet
Example 1: Translate slide content
"Translate all text content on slide 1 into Korean."
Example 2: Fix typos
"Check slide 4 for any typos or errors and correct them."
Example 3: Change formatting
"Change all English text to blue color on slide 3."
See demo videos below for more examples!
Issue: ModuleNotFoundError for openai or google.generativeai
# Solution: Install missing packages
pip install openai==1.74.0 google-generativeaiIssue: FileNotFoundError for credentials.yml
# Solution: Create credentials file from example
cp credentials.yml.example credentials.yml
# Then edit credentials.yml with your API keysIssue: COM error on Windows
- Make sure PowerPoint is installed
- Enable VBA access (see installation guide above)
- Run Python as Administrator if needed
Issue: Flask server not starting
# Check if port 8080 is available
# Try a different port by editing main_flask.py line 341:
# app.run(debug=True, port=8081) # Change to different portThe system follows a hierarchical pipeline:
- Planner: Analyzes user request and creates high-level plan
- Parser: Parses the plan into structured tasks
- Processor: Processes each task with contextual information
- Applier: Applies changes to PowerPoint slides via COM/python-pptx
- Reporter: Generates summary of changes made
Each component is modular and can be extended independently.




