Skip to content

[Bug]: harness kanban CLI invoked from agent session ignores active-board pin, races current file with concurrent boards switch #20074

@curiouscleo

Description

@curiouscleo

Problem

When an agent session uses both the kanban_* tools AND shells out to harness kanban … CLI in the same turn, tasks created via tools can become invisible to subsequent CLI invocations. Symptom from a real orchestrator session:

> kanban_create(title="...", assignee="space-pipeline", ...)
  → returns {"task_id": "t_abc123", "status": "ok"}

> harness kanban show t_abc123
  no such task: t_abc123

The task DOES exist — direct SQLite probe of the right per-board DB confirms it. The CLI is reading a different board's DB.

Root cause

kanban_db.connect() resolves the board via HERMES_KANBAN_BOARD env var → <root>/kanban/current file → default. The two surfaces resolve differently:

  • kanban_* tools run inside the agent process, where HERMES_KANBAN_BOARD is set (either by the dispatcher when spawning a worker, or by the user's shell when launching harness -p <profile> chat). They reliably hit the right board.
  • harness kanban … shelled from within an agent session is a fresh subprocess. It inherits the parent shell's env. If HERMES_KANBAN_BOARD wasn't set in that env (common — most users don't export it; they use harness kanban boards switch <slug> which just writes to the current file), the CLI falls back to the current file.
  • The current file is global state. Any other concurrent harness session can flip it via harness kanban boards switch …. When that happens, the orchestrator session's tool calls keep targeting the original board (env-pinned), but the orchestrator's harness kanban … shell calls suddenly target the new board.

Concrete reproducer

# Terminal 1
harness kanban boards switch space
harness -p space-orchestrator chat
# (in chat) /goal Drive the space board: ... [orchestrator tool-creates a task]

# Terminal 2 (concurrent — e.g. another session, a script, a teammate)
harness kanban boards switch harness-facet

# Back in Terminal 1's orchestrator session, in the same goal turn:
# Tool: kanban_create returns t_xyz successfully (HERMES_KANBAN_BOARD env was
# set when the chat process spawned, persists for tool calls)
# Shell: harness kanban show t_xyz → "no such task: t_xyz"
#         (no env, reads current file, sees harness-facet, looks in wrong DB)

Why it bites orchestrators specifically

Orchestrator personas often need to combine:

So orchestrator sessions are the workload most likely to mix both surfaces in the same turn, which makes them the workload most likely to trip on this divergence.

Worker sessions don't hit this because the dispatcher sets HERMES_KANBAN_BOARD in the spawned child's env directly (kanban_db.py:2593-2623), so even shell calls inherit the right pin.

Proposed fix

When a chat session activates a profile that has kanban in its toolsets, set HERMES_KANBAN_BOARD in the child shell environment to the resolved board at chat-start time. Three implementation options:

  1. At chat boot: cli.py (or wherever HermesCLI.__init__ finalizes the profile env) reads the active board via kanban_db.get_current_board() once and exports HERMES_KANBAN_BOARD into os.environ for the rest of the session. Subsequent shell-outs inherit it. ~5 LOC.

  2. At kanban-toolset registration: When the kanban toolset is enabled (via _check_kanban_mode()), pin the board to env. Same effect, narrower trigger.

  3. In the terminal tool's env-passthrough path: When HERMES_KANBAN_BOARD is set in the agent process env, propagate it to spawned subprocess. (May already happen — needs verification.)

I'd lean toward (1): one-time pin at session start, before any tool registers. Idempotent, easy to test.

Why "test" / "guess this might be the cache invalidation" was wrong

The orchestrator that originally surfaced this called it a "DB-handle caching thing." After investigation it's actually nothing to do with DB handles or caches — it's two different code paths resolving the board differently, with the current file being mutable global state that one of them respects and the other ignores.

Workaround in the meantime

Always pass --board <slug> explicitly to harness kanban invocations from inside an orchestrator session. This is what we now do in the space-orchestrator SOUL.md addendum and the space-kanban-workflow skill. Verbose but reliable.

Discovery context

Hit this while running an autonomous orchestrator /goal on the v0.12.0 release with multiple boards (space, harness-facet, surface, default) on the same install. The orchestrator successfully created task t_04086c86 via kanban_create tool, then immediately tried to harness kanban show t_04086c86 and got "no such task" because the active board had drifted to harness-facet between calls (a different chat session was running on Daniel's other monitor).

Workaround proven working: prepend --board space to every CLI call.

Affected component

CLI / agent-CLI boundary

Severity

P2 — the workaround (always-explicit --board) is documentable and reliable, but the gap is subtle and the failure mode is "looks like a phantom data loss bug" which is hard to diagnose for users without filesystem access.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/cliCLI entry point, hermes_cli/, setup wizardcomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions