Skip to content

Fix OCR capturing VoiceInk status overlay instead of frontmost app window#429

Merged
Beingpax merged 3 commits intoBeingpax:mainfrom
klaudworks:fix/ocr-window-detection
Dec 7, 2025
Merged

Fix OCR capturing VoiceInk status overlay instead of frontmost app window#429
Beingpax merged 3 commits intoBeingpax:mainfrom
klaudworks:fix/ocr-window-detection

Conversation

@klaudworks
Copy link
Copy Markdown
Contributor

@klaudworks klaudworks commented Dec 6, 2025

Summary

  • Fix context-aware OCR always returning "No text detected" by filtering out VoiceInk's own status indicator overlay from window selection

Fixes #424

Problem

During recording, the screen capture service was selecting the first layer-0 window, which was VoiceInk's own status indicator overlay. Since this overlay contains no readable text, OCR always returned "No text detected via OCR."

Solution

  • Filter out windows owned by VoiceInk's process (using ProcessInfo.processInfo.processIdentifier)

Validation

I'm running the new version locally and OCR works now.


Summary by cubic

Prevent OCR from capturing the VoiceInk status overlay by filtering our own windows and selecting the frontmost app’s window instead. Restores context-aware OCR so it reads actual on-screen text. Fixes #424.

  • Bug Fixes

    • Exclude VoiceInk-owned windows.
    • Prefer the frontmost app’s layer-0 window.
    • Run window enumeration off the main thread for smoother UI.
  • Refactors

    • Move WindowCandidate to class scope for reuse and clarity.

Written for commit d25ae52. Summary will update automatically on new commits.

…ndow

The screen capture service was selecting the first layer-0 window, which
during recording was VoiceInk's own status indicator overlay. This caused
OCR to always return 'No text detected' since the overlay has no readable
content.

Changes:
- Filter out windows owned by VoiceInk's process
- Prioritize windows belonging to NSWorkspace.frontmostApplication
- Filter out tiny windows (<120x120) to avoid tooltips/overlays
- Move CGWindowListCopyWindowInfo off main thread for better UI responsiveness
- Refactor WindowCandidate struct to class scope
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="VoiceInk/Services/ScreenCaptureService.swift">

<violation number="1" location="VoiceInk/Services/ScreenCaptureService.swift:68">
P2: Rule violated: **Backward compatibility**

Removing the minimum window size check (120x120) breaks backward compatibility and contradicts the PR description which claims to exclude tiny windows. Previously, small windows like tooltips, notification badges, or tiny status indicators were filtered out. Now they can be selected for OCR capture, potentially degrading OCR results.

Consider restoring the size filter to maintain the previous behavior while still fixing the original bug.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

func isEligible(_ candidate: WindowCandidate) -> Bool {
guard candidate.layer == 0 else { return false }
guard candidate.ownerPID != currentPID else { return false }
return true
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Rule violated: Backward compatibility

Removing the minimum window size check (120x120) breaks backward compatibility and contradicts the PR description which claims to exclude tiny windows. Previously, small windows like tooltips, notification badges, or tiny status indicators were filtered out. Now they can be selected for OCR capture, potentially degrading OCR results.

Consider restoring the size filter to maintain the previous behavior while still fixing the original bug.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At VoiceInk/Services/ScreenCaptureService.swift, line 68:

<comment>Removing the minimum window size check (120x120) breaks backward compatibility and contradicts the PR description which claims to exclude tiny windows. Previously, small windows like tooltips, notification badges, or tiny status indicators were filtered out. Now they can be selected for OCR capture, potentially degrading OCR results.

Consider restoring the size filter to maintain the previous behavior while still fixing the original bug.</comment>

<file context>
@@ -65,7 +65,7 @@ class ScreenCaptureService: ObservableObject {
             guard candidate.layer == 0 else { return false }
             guard candidate.ownerPID != currentPID else { return false }
-            return candidate.bounds.width &gt;= 120 &amp;&amp; candidate.bounds.height &gt;= 120
+            return true
         }
 
</file context>
Suggested change
return true
return candidate.bounds.width >= 120 && candidate.bounds.height >= 120
Fix with Cubic

Copy link
Copy Markdown
Contributor Author

@klaudworks klaudworks Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a misleading comment. It refers to a code change from my first commit. Not sure why the AI complains about backwards compatibility to my own prior commit in the PR.

Anyways, I first tried to filter out all small windows from the candidate list for OCR. However, in my tests with small popups e.g. from this app https://dropoverapp.com/ the implemented changes were sufficient. There may be cases where it is required to filter out small overlay windows but I'd rather not add the change unless it's confirmed that we need this. Also 120x120px was kinda arbitrary.

…ndow

The screen capture service was selecting the first layer-0 window, which
during recording was VoiceInk's own status indicator overlay. This caused
OCR to always return 'No text detected' since the overlay has no readable
content.

Changes:
- Filter out windows owned by VoiceInk's process
- Prioritize windows belonging to NSWorkspace.frontmostApplication
- Refactor WindowCandidate struct to class scope
@Beingpax Beingpax merged commit 45dbd57 into Beingpax:main Dec 7, 2025
2 checks passed
ryotapoi pushed a commit to ryotapoi/VoiceInk that referenced this pull request Feb 11, 2026
Fix OCR capturing VoiceInk status overlay instead of frontmost app window
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Screen content not detected

2 participants