Skip to content

feishu: pass audio duration to file upload to fix TTS truncation (#33043)#33060

Open
chengzhichao-xydt wants to merge 2 commits intoopenclaw:mainfrom
chengzhichao-xydt:fix/33043-feishu-tts-audio-truncated-upstream
Open

feishu: pass audio duration to file upload to fix TTS truncation (#33043)#33060
chengzhichao-xydt wants to merge 2 commits intoopenclaw:mainfrom
chengzhichao-xydt:fix/33043-feishu-tts-audio-truncated-upstream

Conversation

@chengzhichao-xydt
Copy link
Copy Markdown
Contributor

Summary

  • Problem: When TTS audio is sent via the Feishu channel (Feishu → Feishu), playback is truncated to only 2–3 words. The generated audio file is correct (~20 KB), but the Feishu client stops playing almost immediately.
  • Root cause: sendMediaFeishu calls uploadFileFeishu for opus audio without providing the duration field (milliseconds). Feishu's im.file.create API treats a missing duration as 0 ms; the client renders a 0:00 progress bar and halts playback after the initial buffer drains.
  • Why WeCom → Feishu cross-channel works: That path goes through the WeCom outbound adapter, which does not use sendMediaFeishu, so the Feishu upload code path is never reached.
  • What changed:
    • Added parseFfprobeDurationSecs (pure parser, easy to unit-test) and getAudioDurationMs (ffprobe-backed, returns number | undefined) to src/media/ffmpeg-exec.ts.
    • Exposed getAudioDurationMs through PluginRuntimeCore.media so extensions can use it without bypassing the plugin boundary.
    • In sendMediaFeishu, when the source is a local file path and the file type is opus or mp4, the duration is probed before upload and forwarded to uploadFileFeishu. If ffprobe is unavailable the field is simply omitted (graceful degradation).
  • What did NOT change: All other send paths (images, documents, non-local buffers), text delivery, and WeCom/other channel behavior are unaffected.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Feishu TTS audio now plays the full reply instead of being truncated. No behavior changes for any other channel.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No (ffprobe is a local process; the existing Feishu file upload call is unchanged except for the additional duration field)
  • Command/tool execution surface changed? No (ffprobe was already used for Discord voice messages)
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: OpenCloudOS 9.4 / Linux (kernel 6.6)
  • Runtime/container: Node 22+ / gateway local mode
  • Model/provider: Any (TTS via OpenAI / ElevenLabs / Edge)
  • Integration/channel: Feishu (飞书)
  • Relevant config (redacted): messages.tts.auto: "always"

Steps

  1. Configure Feishu channel and a TTS provider.
  2. Set messages.tts.auto: "always" in config.
  3. Send a message from a Feishu user to the bot.
  4. Observe the TTS audio reply in the Feishu chat.

Expected

  • Full TTS audio plays without truncation.
  • Duration indicator in Feishu client matches the actual audio length.

Actual (before fix)

  • Audio stops after 2–3 words.
  • Feishu client shows 0:00 duration.

Actual (after fix)

  • Full audio plays correctly.
  • Duration matches the probed value from ffprobe.

Evidence

  • 36 tests pass: 29 existing Feishu media tests + 4 new Feishu duration tests + 4 new parseFfprobeDurationSecs unit tests — all green (pnpm test -- extensions/feishu/src/media.test.ts src/media/ffmpeg-exec.test.ts)
  • pnpm tsgo — no new type errors (pre-existing tlon module errors unrelated)
  • No changes to reconnect/routing/auth logic

Human Verification (required)

  • Verified scenarios: Local-path opus upload passes duration; getAudioDurationMs returns undefined gracefully when ffprobe is unavailable; mediaBuffer-only calls do not trigger ffprobe.
  • Edge cases checked: ffprobe missing → duration omitted (same behavior as before, no regression); non-audio file types unaffected.
  • What you did NOT verify: End-to-end playback on physical Feishu mobile app (requires live environment with TTS keys).

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: git revert 79598f3ce
  • Files/config to restore: extensions/feishu/src/media.ts, src/media/ffmpeg-exec.ts, src/plugins/runtime/runtime-media.ts, src/plugins/runtime/types-core.ts
  • Known bad symptoms reviewers should watch for: Feishu audio upload returning an API error due to an unexpected duration field (extremely unlikely — field is documented and optional-but-recommended by Feishu)

Risks and Mitigations

  • ffprobe unavailable on host: getAudioDurationMs catches all errors and returns undefined; duration is then omitted from the upload request, preserving the pre-fix behavior rather than hard-failing.
  • Very long TTS audio: ffprobe reads container metadata only (no decode), so probing a 30 MB file is near-instant.

@openclaw-barnacle openclaw-barnacle bot added channel: feishu Channel integration: feishu size: S labels Mar 3, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR fixes Feishu TTS audio truncation by probing the audio duration via ffprobe and passing it to the Feishu file upload API. Without the duration field, Feishu's client interprets a missing value as 0 ms and halts playback immediately.

Key changes:

  • src/media/ffmpeg-exec.ts: Adds parseFfprobeDurationSecs (pure parser with edge-case handling) and getAudioDurationMs (wraps ffprobe, returns undefined on any error for graceful degradation).
  • extensions/feishu/src/media.ts: Detects local file paths before calling loadWebMedia, then probes duration for opus/mp4 file types and forwards it to uploadFileFeishu. Non-local (HTTP) URLs and mediaBuffer paths are unaffected.
  • Plugin runtime: getAudioDurationMs is correctly wired through PluginRuntimeCore.media so extensions can call it without bypassing the plugin boundary.
  • Tests: Unit tests for the pure parser and integration tests for the three main paths (duration passed, duration omitted on ffprobe failure, no probe when using buffer) are all included.

The implementation is straightforward and well-scoped — one minor comment inaccuracy has been flagged for clarification.

Confidence Score: 5/5

  • This PR is safe to merge—the fix is narrowly scoped and degrades gracefully when ffprobe is unavailable.
  • The implementation is minimal and well-isolated, only affecting Feishu audio/video uploads from local file paths. All other send paths remain untouched. The ffprobe call is wrapped in a try/catch returning undefined on any error, preserving pre-fix behavior on failures. Test coverage is solid and the Feishu API field is documented as optional-but-recommended. The one finding flagged is a minor comment clarification that improves code documentation without affecting functionality.
  • No files require special attention.

Last reviewed commit: 79598f3

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +438 to +441
// For local paths, we can probe duration before loading into memory.
// Feishu's audio API requires duration in ms; without it the client
// defaults to 0ms and truncates playback.
let localSourcePath: string | undefined;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "we can probe duration before loading into memory", but the actual getAudioDurationMs call happens on line 476, after loadWebMedia (line 451) has already loaded the file into memory. The variable localSourcePath is captured early, but the probing itself is deferred. The comment describes the intent but misrepresents the runtime timing.

Suggested change
// For local paths, we can probe duration before loading into memory.
// Feishu's audio API requires duration in ms; without it the client
// defaults to 0ms and truncates playback.
let localSourcePath: string | undefined;
// Capture local path before loading so we can probe its duration later.
// Feishu's audio API requires duration in ms; without it the client
// defaults to 0ms and truncates playback.
let localSourcePath: string | undefined;
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/media.ts
Line: 438-441

Comment:
The comment says "we can probe duration **before** loading into memory", but the actual `getAudioDurationMs` call happens on line 476, **after** `loadWebMedia` (line 451) has already loaded the file into memory. The variable `localSourcePath` is captured early, but the probing itself is deferred. The comment describes the intent but misrepresents the runtime timing.

```suggestion
  // Capture local path before loading so we can probe its duration later.
  // Feishu's audio API requires duration in ms; without it the client
  // defaults to 0ms and truncates playback.
  let localSourcePath: string | undefined;
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79598f3ced

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

name = fileName ?? "file";
} else if (mediaUrl) {
// Detect local file paths (absolute paths not starting with a URL scheme)
if (/^\//.test(mediaUrl) || /^[A-Za-z]:[/\\]/.test(mediaUrl)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle file:// media URLs when probing upload duration

sendMediaFeishu only marks a source as local when mediaUrl starts with / or a drive letter, but Feishu outbound passes mediaUrl through unchanged and loadWebMedia explicitly supports file://... inputs by converting them to local paths. For file://...voice.opus/file://...mp4 uploads, this branch skips getAudioDurationMs, so uploadFileFeishu omits duration and the original Feishu truncation behavior (0:00 playback) still occurs for that valid local-media form.

Useful? React with 👍 / 👎.

oolong-tea-2026 added a commit to oolong-tea-2026/openclaw that referenced this pull request Mar 24, 2026
… playable audio messages

The Feishu outbound media router only recognized .opus and .ogg as audio,
sending common formats like mp3/wav/m4a/flac/aac as file attachments
instead of inline playable audio bubbles.

Changes:
- Use existing isAudioFileName() + mimeKind === 'audio' for audio detection,
  consistent with how image routing already uses mimeKind === 'image'
- Expand detectFileType() to map common audio extensions to 'opus'
- Add audio duration parsing via music-metadata (already in monorepo for
  extensions/matrix) with graceful degradation
- Update existing test and add coverage for mp3/wav/m4a/flac/aac routing

Verified: Feishu im/v1/files API accepts mp3/wav/m4a/flac/aac with
file_type=opus and plays them correctly. This is consistent with the
community Feishu plugin @m1heng-clawd/feishu (2.3k+ stars) which maps
all common audio extensions to file_type=opus in production.

Related: openclaw#33060, openclaw#43388, openclaw#33736, openclaw#28269
oolong-tea-2026 added a commit to oolong-tea-2026/openclaw that referenced this pull request Mar 27, 2026
… playable audio messages

The Feishu outbound media router only recognized .opus and .ogg as audio,
sending common formats like mp3/wav/m4a/flac/aac as file attachments
instead of inline playable audio bubbles.

Changes:
- Use existing isAudioFileName() + mimeKind === 'audio' for audio detection,
  consistent with how image routing already uses mimeKind === 'image'
- Expand detectFileType() to map common audio extensions to 'opus'
- Add audio duration parsing via music-metadata (already in monorepo for
  extensions/matrix) with graceful degradation
- Update existing test and add coverage for mp3/wav/m4a/flac/aac routing

Verified: Feishu im/v1/files API accepts mp3/wav/m4a/flac/aac with
file_type=opus and plays them correctly. This is consistent with the
community Feishu plugin @m1heng-clawd/feishu (2.3k+ stars) which maps
all common audio extensions to file_type=opus in production.

Related: openclaw#33060, openclaw#43388, openclaw#33736, openclaw#28269
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 1. [Bug] Feishu TTS audio truncated when reply from Feishu to Feishu

1 participant