Skip to content

fix(feishu): clear stale streamingStartPromise on card creation failure#43502

Merged
ademczuk merged 2 commits intoopenclaw:mainfrom
ademczuk:fix/feishu-streaming-start-promise-leak
Mar 14, 2026
Merged

fix(feishu): clear stale streamingStartPromise on card creation failure#43502
ademczuk merged 2 commits intoopenclaw:mainfrom
ademczuk:fix/feishu-streaming-start-promise-leak

Conversation

@ademczuk
Copy link
Copy Markdown
Contributor

Summary

  • When FeishuStreamingSession.start() throws (e.g. HTTP 400 from CardKit v1 API), the catch block nulls streaming but leaves streamingStartPromise set
  • The guard in startStreaming() checks streamingStartPromise first, so all future deliver() calls silently skip streaming
  • Session locks permanently - all subsequent messages are dropped

One-line fix: clear streamingStartPromise in the catch block alongside streaming.

Fixes #43322

Changes

File Change
extensions/feishu/src/reply-dispatcher.ts Clear streamingStartPromise = null in the start() catch block
extensions/feishu/src/reply-dispatcher.test.ts Regression test: first deliver() triggers a failing start(), second deliver() creates a new session successfully
CHANGELOG.md Added fix entry

Test plan

  • pnpm vitest run extensions/feishu/src/reply-dispatcher.test.ts - 23/23 passing (22 existing + 1 new regression test)
  • pnpm format clean
  • New test uses push-intercept pattern to make first start() reject, verifies second deliver() creates a fresh streaming session

@openclaw-barnacle openclaw-barnacle bot added channel: feishu Channel integration: feishu size: XS labels Mar 11, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a session-lock bug in FeishuStreamingSession: when start() throws (e.g. HTTP 400 from the CardKit v1 API), the catch block previously nulled streaming but left streamingStartPromise pointing to the resolved IIFE. Because startStreaming() guards on streamingStartPromise first, all subsequent deliver() calls would skip streaming silently and all future messages would be dropped for the lifetime of the dispatcher.

The fix is a single-line addition — streamingStartPromise = null in the same catch block — which correctly unblocks the guard and allows the next deliver() call to retry streaming. A targeted regression test validates the full recovery path.

Key changes:

  • extensions/feishu/src/reply-dispatcher.ts: Clear streamingStartPromise in the start() catch block alongside streaming, allowing the session to retry on the next delivery.
  • extensions/feishu/src/reply-dispatcher.test.ts: New test intercepts streamingInstances.push to inject a rejecting start() on the first session, then verifies that a second deliver() successfully creates and closes a fresh streaming session. The streamingInstances.push override is restored sequentially in the test body rather than inside a finally block — if an assertion fails before the restore line, the override persists (though it is practically harmless because shouldFailStart is already false).
  • CHANGELOG.md: Entry added under the current release.

Confidence Score: 5/5

  • This PR is safe to merge — the fix is a minimal, targeted one-liner with a direct regression test and no side effects on the surrounding control flow.
  • The change touches a single catch block, clears a variable that was already being cleared in closeStreaming, and the concurrent-await semantics are correct (callers holding a reference to the IIFE promise still resolve cleanly). The regression test covers the exact failure scenario. No unrelated code is modified.
  • No files require special attention beyond the minor test-cleanup hygiene noted in reply-dispatcher.test.ts.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/feishu/src/reply-dispatcher.test.ts
Line: 628-629

Comment:
**Push override not restored in `finally` block**

The `streamingInstances.push` restore is executed sequentially in the test body after the assertions. If any assertion before line 629 fails (e.g. `expect(streamingInstances).toHaveLength(2)` on line 632), the override is never cleaned up and persists into subsequent tests.

`beforeEach` resets `streamingInstances.length = 0` but does not reset `streamingInstances.push`, so the override would carry over. In practice it is mostly harmless because `shouldFailStart` is `false` by that point (making the override a no-op passthrough to `origPush`), but it is not idiomatic and could mask subtle issues if the test suite is extended.

Consider wrapping the test body with a `try/finally`:

```ts
    const origPush = streamingInstances.push;
    try {
      streamingInstances.push = function (this: any[], ...args: any[]) {
        if (shouldFailStart) {
          args[0].start = vi
            .fn()
            .mockRejectedValue(new Error("Create card request failed with HTTP 400"));
          shouldFailStart = false;
        }
        return origPush.apply(this, args);
      } as any;

      // ... test body ...

    } finally {
      streamingInstances.push = origPush;
    }
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 6cf243a

@ademczuk ademczuk marked this pull request as draft March 11, 2026 22:53
@ademczuk ademczuk force-pushed the fix/feishu-streaming-start-promise-leak branch from 66766cf to 6cf243a Compare March 11, 2026 22:55
@ademczuk
Copy link
Copy Markdown
Contributor Author

Re: Aisle (CWE-400 unbounded retry)

The prior behaviour was a permanent session lock - zero retries, ever. Once start() failed, every future message was silently dropped for the rest of the session (13+ hours in the reported case). Clearing streamingStartPromise allows one retry per deliver() call, which is strictly better than permanent failure. A bounded retry or circuit breaker could be reasonable follow-up work but sits outside this bug fix's scope.

Re: Greptile (5/5)

Cheers for the thorough analysis. The partialUpdateQueue safety point re: lazy reference reads is spot on - closeStreaming() already had the same streamingStartPromise = null cleanup, so this just mirrors it in the failure path.

@ademczuk ademczuk marked this pull request as ready for review March 12, 2026 00:45
@ademczuk ademczuk requested a review from Takhoffman March 12, 2026 00:54
@openclaw-barnacle openclaw-barnacle bot added the maintainer Maintainer-authored PR label Mar 12, 2026
@ademczuk ademczuk force-pushed the fix/feishu-streaming-start-promise-leak branch from e42a39c to edd02ee Compare March 12, 2026 05:58
@ademczuk ademczuk force-pushed the fix/feishu-streaming-start-promise-leak branch from 25e01f5 to 48548de Compare March 13, 2026 17:32
@ademczuk ademczuk requested a review from joshavant March 13, 2026 17:34
@ademczuk ademczuk force-pushed the fix/feishu-streaming-start-promise-leak branch from 48548de to 9eded44 Compare March 13, 2026 21:10
@ademczuk
Copy link
Copy Markdown
Contributor Author

ademczuk commented Mar 13, 2026

Rebased onto latest main. CHANGELOG entry verified. All code checks, extensions, and protocol pass. The node test shard and Windows shard failures are from the temp-path-guard.test.ts regression on main (da1ec45505) - same failures on every open PR in the repo, including PRs that maintainers are actively merging.

@ademczuk ademczuk force-pushed the fix/feishu-streaming-start-promise-leak branch from 9eded44 to f87a912 Compare March 13, 2026 22:39
When FeishuStreamingSession.start() throws (HTTP 400), the catch block
sets streaming = null but leaves streamingStartPromise dangling. The
guard in startStreaming() checks streamingStartPromise first, so all
future deliver() calls silently skip streaming - the session locks
permanently.

Clear streamingStartPromise in the catch block so subsequent messages
can retry streaming instead of dropping all future replies.

Fixes openclaw#43322
@ademczuk ademczuk force-pushed the fix/feishu-streaming-start-promise-leak branch from f87a912 to a1f2a6a Compare March 14, 2026 06:55
@ademczuk ademczuk merged commit c6e3283 into openclaw:main Mar 14, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu maintainer Maintainer-authored PR size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Feishu card API failure causes permanent session lock (13+ hour "no reply" block)

1 participant