Skip to content

test(skills): add PPTX extraction integration tests with real .pptx fixture #1057

@WilliamBerryiii

Description

@WilliamBerryiii

Summary

The PowerPoint extraction skill under .github/skills/experimental/powerpoint/scripts/ has ~4,700 lines across 14 modules, but its test suite relies entirely on unit tests with mocked file I/O. Integration tests using a real .pptx fixture would validate the end-to-end extraction pipeline — slide parsing, image discovery, theme color resolution, metadata extraction, and cross-module data flow.

Root Cause

The initial test strategy (#1012 epic) focused on Hypothesis property-based testing (#1013, PR #1046 in-flight) and security fuzzing (Atheris). These are valuable for input-space exploration and crash discovery, but they operate on individual functions with synthetic inputs. No tests exercise the full extraction pipeline against an actual PowerPoint file, leaving a gap where:

  • Serialization/deserialization mismatches between modules go undetected
  • File I/O edge cases (ZIP extraction, media blob handling, XML namespace resolution) are never tested against real data
  • Cross-module data contracts (e.g., extract_content.pydiscover_images.pyanalyze_media.py) are only verified in isolation

What Needs to Be Addressed

  1. Test fixture — A real .pptx file that exercises the skill's key code paths: multiple slides, embedded images, theme colors, speaker notes, and metadata fields
  2. Integration test suite — Tests that invoke the top-level extraction entry point and validate outputs against known-good expectations from the fixture
  3. CI integration — Tests must run as part of the existing npm run test:py / pytest pipeline without special setup

How to Address

  1. Create a minimal test fixture (.pptx file) containing:

    • At least 2 slides with different layouts
    • At least 1 embedded image (PNG or JPEG)
    • Theme colors applied to text elements
    • Speaker notes on at least 1 slide
    • Standard metadata (title, author, creation date)
    • Place under tests/fixtures/ or the skill's test directory
  2. Write integration tests that:

    • Call the extraction entry point with the fixture path (no mocks)
    • Assert slide count, text content, image discovery results, and metadata fields
    • Validate theme color extraction produces expected RGB values
    • Confirm no exceptions on the full extraction pipeline
    • Test error handling with a corrupted/truncated .pptx if feasible
  3. Wire into CI — Ensure conftest.py or pytest configuration picks up the new test module and fixture path

Related Issues

Acceptance Criteria

  • At least one real .pptx test fixture committed to the repository
  • Integration tests validate end-to-end extraction: slides, images, metadata, theme colors
  • Tests run as part of npm run test:py or the skill's pytest suite
  • No mocked file I/O — tests exercise the actual extraction pipeline
  • All existing tests continue to pass (npm run test:py)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestskillsCopilot skill packages (SKILL.md)

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions