Skip to content

Conversation

@ftnext
Copy link
Collaborator

@ftnext ftnext commented Dec 30, 2025

Inspired by #859

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces mypy type checking as a CI/CD step and addresses various type-related issues throughout the codebase. The changes refactor duplicated type definitions into a shared generic base type, add type ignore comments where dynamic attribute assignment occurs, and rename exception variables for clarity.

Key changes:

  • Adds mypy type checking workflow and Makefile target
  • Introduces generic TranscribeOutputBase[SegmentT] to eliminate duplicate type definitions
  • Adds type ignore comments for dynamic attribute assignments and runtime monkey-patching

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
.github/workflows/typecheck.yml New GitHub Actions workflow to run type checking on push/PR to master
Makefile Adds typecheck target with mypy configuration
setup.cfg Adds mypy and types-requests to dev dependencies
speech_recognition/recognizers/whisper_local/base.py Introduces generic TranscribeOutputBase and updates Protocol with generics
speech_recognition/recognizers/whisper_local/whisper.py Removes duplicate TranscribeOutput definition, uses shared TranscribeOutputBase
speech_recognition/recognizers/whisper_local/faster_whisper.py Removes duplicate TranscribeOutput definition, uses shared TranscribeOutputBase
speech_recognition/recognizers/google_cloud.py Changes encoding type from enum to int for type compatibility
speech_recognition/__init__.py Adds type ignore comments for dynamic attributes and renames exception variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +15 to +18
class TranscribeOutputBase(TypedDict, Generic[SegmentT]):
text: str
segments: list[SegmentT]
language: str
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypedDict does not support inheritance from Generic in Python. While this syntax may work in some type checkers, it's not officially supported and could lead to runtime errors or inconsistent behavior across different type checkers. Consider using a Protocol with properties instead, or use a regular class with dataclass for better type safety.

Copilot uses AI. Check for mistakes.
"""

encoding: Required[RecognitionConfig.AudioEncoding]
encoding: Required[int]
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the type from RecognitionConfig.AudioEncoding to int reduces type safety. While RecognitionConfig.AudioEncoding values are integers, using the specific enum type provides better type checking and makes the API clearer. If this change is necessary for type checking to pass, consider using RecognitionConfig.AudioEncoding | int or investigate why mypy is having issues with the original type.

Suggested change
encoding: Required[int]
encoding: Required["RecognitionConfig.AudioEncoding | int"]

Copilot uses AI. Check for mistakes.
@ftnext
Copy link
Collaborator Author

ftnext commented Dec 30, 2025

Code Review by Claude Sonnet 4.5 (/review in local)

Overview

This PR introduces type checking to the project using mypy and adds a GitHub Actions workflow to run type checks on every push and pull request to master. The changes include:

  • New type check CI workflow
  • Type checking configuration in Makefile
  • Type annotations and type: ignore comments throughout the codebase
  • Refactored type definitions for better type safety (Generic TypedDict for Whisper)

Stats: +83 additions, -49 deletions

Positive Aspects

✅ Strong foundation for type safety

  • Adding mypy to CI is excellent for catching type errors early
  • The Generic TypedDict approach in whisper_local/base.py is well-designed

✅ Improved code clarity

  • Renaming exc → failed/not_ready in speech_recognition/init.py:1040-1050 improves readability

✅ Proper infrastructure setup

  • CI workflow properly installs all dependencies including audio libraries
  • Makefile target is simple and clear

Issues & Suggestions

🔴 Critical Issues

  1. Excessive use of type: ignore comments
  • Files affected: speech_recognition/init.py (lines 376, 379, 386-387, 1289-1298, 1326)
  • Issue: While some type: ignore comments are necessary initially, this PR adds many without addressing the root cause
  • Suggestion:
    • For AudioSource attributes (CHUNK, SAMPLE_RATE, etc.), consider adding a Protocol or base class that defines these attributes
    • For dynamic method assignment (recognize_google, etc.), consider using proper type stubs or Protocol definitions
# Example: Define a proper Protocol instead of ignoring
from typing import Protocol

class AudioSourceProtocol(Protocol):
    CHUNK: int
    SAMPLE_RATE: int
    SAMPLE_WIDTH: int
    stream: Any  # or more specific type
  1. Type regression in google_cloud.py:52
  • encoding: Required[RecognitionConfig.AudioEncoding]
  • encoding: Required[int]
  • Issue: This loses type information. RecognitionConfig.AudioEncoding is likely an Enum that provides better type safety than raw int
  • Suggestion: Keep the original type and investigate why mypy complained. The issue might be with how Required is used with Enums in TypedDict

⚠️ Medium Priority

  1. Missing type annotations in Makefile .PHONY
    .PHONY: lint rstcheck distribute publish
  • Issue: The new typecheck target is missing from .PHONY
  • Fix: Should be .PHONY: lint rstcheck typecheck distribute publish
  1. CI workflow could be optimized
  • Currently installs ALL optional dependencies for type checking
  • Suggestion: Consider whether all these are needed just for type checking, or if type stubs would suffice
  1. Inconsistent ignore comments
  • Some ignore comments specify the error code ([attr-defined]), others don't
  • Best practice: Always specify error codes for better maintainability
  • Example: # type: ignore → # type: ignore[attr-defined]

💡 Suggestions for Future Improvements

  1. No tests for type checking
  • Consider adding test cases that verify types work correctly with actual usage
  • This ensures the type hints match runtime behavior
  1. Documentation
  • Add a comment explaining why --ignore-missing-imports is used
  • Document in README or CONTRIBUTING.md that type checking is now required
  1. Generic TypedDict is excellent but could be more complete
  • The TranscribeOutputBase design is good, but consider adding more strict types for segment fields if possible

Security & Performance

✅ No security concerns identified
✅ No performance implications - type checking only runs in CI


Code Quality Score: 7/10

Strengths:

  • Solid infrastructure addition
  • Good refactoring of Whisper type definitions
  • Proper CI integration

Areas for Improvement:

  • Too many type: ignore comments mask underlying type issues
  • Type regression in google_cloud.py
  • Missing typecheck from .PHONY
  • Could benefit from more thorough type coverage

Recommendation

Approve with changes requested ✏️

This PR adds valuable type checking infrastructure, but should address:

  1. Fix the .PHONY declaration
  2. Reconsider the google_cloud.py type change
  3. Add issue/TODO comments for the ignored type errors to track future improvements

The foundation is solid - once these issues are addressed, this will significantly improve code quality.

@ftnext ftnext merged commit 2baf672 into Uberi:master Dec 30, 2025
9 checks passed
@ftnext ftnext deleted the run-type-check branch December 30, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant