Summary
The PowerPoint extraction skill under .github/skills/experimental/powerpoint/scripts/ has ~4,700 lines across 14 modules, but its test suite relies entirely on unit tests with mocked file I/O. Integration tests using a real .pptx fixture would validate the end-to-end extraction pipeline — slide parsing, image discovery, theme color resolution, metadata extraction, and cross-module data flow.
Root Cause
The initial test strategy (#1012 epic) focused on Hypothesis property-based testing (#1013, PR #1046 in-flight) and security fuzzing (Atheris). These are valuable for input-space exploration and crash discovery, but they operate on individual functions with synthetic inputs. No tests exercise the full extraction pipeline against an actual PowerPoint file, leaving a gap where:
- Serialization/deserialization mismatches between modules go undetected
- File I/O edge cases (ZIP extraction, media blob handling, XML namespace resolution) are never tested against real data
- Cross-module data contracts (e.g.,
extract_content.py → discover_images.py → analyze_media.py) are only verified in isolation
What Needs to Be Addressed
- Test fixture — A real
.pptx file that exercises the skill's key code paths: multiple slides, embedded images, theme colors, speaker notes, and metadata fields
- Integration test suite — Tests that invoke the top-level extraction entry point and validate outputs against known-good expectations from the fixture
- CI integration — Tests must run as part of the existing
npm run test:py / pytest pipeline without special setup
How to Address
-
Create a minimal test fixture (.pptx file) containing:
- At least 2 slides with different layouts
- At least 1 embedded image (PNG or JPEG)
- Theme colors applied to text elements
- Speaker notes on at least 1 slide
- Standard metadata (title, author, creation date)
- Place under
tests/fixtures/ or the skill's test directory
-
Write integration tests that:
- Call the extraction entry point with the fixture path (no mocks)
- Assert slide count, text content, image discovery results, and metadata fields
- Validate theme color extraction produces expected RGB values
- Confirm no exceptions on the full extraction pipeline
- Test error handling with a corrupted/truncated
.pptx if feasible
-
Wire into CI — Ensure conftest.py or pytest configuration picks up the new test module and fixture path
Related Issues
Acceptance Criteria
Summary
The PowerPoint extraction skill under
.github/skills/experimental/powerpoint/scripts/has ~4,700 lines across 14 modules, but its test suite relies entirely on unit tests with mocked file I/O. Integration tests using a real.pptxfixture would validate the end-to-end extraction pipeline — slide parsing, image discovery, theme color resolution, metadata extraction, and cross-module data flow.Root Cause
The initial test strategy (#1012 epic) focused on Hypothesis property-based testing (#1013, PR #1046 in-flight) and security fuzzing (Atheris). These are valuable for input-space exploration and crash discovery, but they operate on individual functions with synthetic inputs. No tests exercise the full extraction pipeline against an actual PowerPoint file, leaving a gap where:
extract_content.py→discover_images.py→analyze_media.py) are only verified in isolationWhat Needs to Be Addressed
.pptxfile that exercises the skill's key code paths: multiple slides, embedded images, theme colors, speaker notes, and metadata fieldsnpm run test:py/ pytest pipeline without special setupHow to Address
Create a minimal test fixture (
.pptxfile) containing:tests/fixtures/or the skill's test directoryWrite integration tests that:
.pptxif feasibleWire into CI — Ensure
conftest.pyor pytest configuration picks up the new test module and fixture pathRelated Issues
Acceptance Criteria
.pptxtest fixture committed to the repositorynpm run test:pyor the skill's pytest suitenpm run test:py)