Skip to content

Architecture planning#3

Merged
teknium1 merged 4 commits intomainfrom
architecture-planning
Jan 8, 2026
Merged

Architecture planning#3
teknium1 merged 4 commits intomainfrom
architecture-planning

Conversation

@dmahan93
Copy link
Copy Markdown
Collaborator

No description provided.

@dmahan93 dmahan93 requested a review from teknium1 September 12, 2025 22:47
Comment thread architecture/agents.md Outdated
tools = self.tools()
for agent in self.agent_primitives():
tools.extend(agent.tools())
tools = set(tools)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forcing tools to be hashable might be tedious

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is on lists, lists don't have that requirement

Copy link
Copy Markdown
Collaborator Author

@dmahan93 dmahan93 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait the set thing, yeah, assume there's a tool_name that gets hashed here instead

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not going to implement a whole stack in the architecture description

Copy link
Copy Markdown
Collaborator Author

@dmahan93 dmahan93 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically replace this operation with remove_duplicates(tools)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o kk sick

Comment thread architecture/agents.md

```python
class BaseAgent:
def agent_primitives(self) -> list[BaseAgent]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is an agent primitive the exact same concept as a subagent? only if exactly the same i think we should rename it to subagent bc that's more readable to the outside world

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subagents are a tool, agent primitives are like run_rubric_judgement

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha thanks

Comment thread architecture/agents.md
return self(llm, tools, config, *args, **kwargs)

@staticmethod
def __call__(self, llm, tools, config, *args, **kwargs) -> ConversationGraph:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like it's a bit strange to have both this kind of signature and self.llm, self.tools etc

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no self.llm?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry myb only for self.tools then

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean what specifically is weird about it to you? Should I add in more documentation on why having a stateless agent is a good thing, should I rename variables?

Copy link
Copy Markdown
Contributor

@hjc-puro hjc-puro Sep 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like both passing in tools into __call__ and having self.tools - confusion is which tools are actually being used?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh myb i missed the @staticmethod. i think this is fine


Edges are the connections between nodes, and there are two types we are concerned with:
- **Sequential edges**: These represent the flow of conversation, connecting messages in the order they were sent. For example, a user message followed by an assistant response.
- **Parallel edges**: These represent versioning, e.g. edit history, context squishing, etc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that sequential edge tracking has a benefit for training (don't train on a prefix more than once) but what about parallel edge tracking? I guess it's important for observability but is there also a benefit for training?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sequential is the normal conversation flow, parallel is when there's breaks in the prefix

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. suppose I compact my history by keeping system/user prompt and most recent 2 turns.

is this graph progression correct?

graph for og history:

system -> user1 -> ass1 -> user2 -> ass2 -> user3 -> ass3 -> user4 -> ass4 -> user 5 -> ass5

graph after compact

system -> user1 -> ass1 -> user2 -> ass2 -> user3 -> ass3 -> user4 -> ass4 -> user5 -> ass5
  |         |        |                                         |        |       |        |     (parallel edges)
system -> user1 -> ass1 -----------------------------------> user4 -> ass4 -> user5 -> ass5

or are nodes not duplicated?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the parallel edge here is a reference to the previous graph moreso than individual nodes themselves, I might have to just redescribe it as a DAG with versioning instead of parallel edges

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice i prefer that way too (nodes are message histories edges are transformations on message histories)

@teknium1 teknium1 merged commit d5af538 into main Jan 8, 2026
JasonOA888 added a commit to JasonOA888/hermes-agent that referenced this pull request Mar 8, 2026
Fixes NousResearch#633

Problem:
- Sequential numbering gaps (e.g., NousResearch#1, NousResearch#2, NousResearch#5, NousResearch#8) confuse users
- 200 char truncation too aggressive
- Tool messages completely hidden with no indication

Fix:
1. Use separate counter for displayed messages only
2. Skip tool messages but show count at end
3. Skip system messages
4. Increase truncation to 300 chars
5. Display 'N tool messages hidden' summary

Impact:
- Consistent numbering: NousResearch#1, NousResearch#2, NousResearch#3, NousResearch#4
- Users know when tool calls occurred
- More context visible per message
teknium1 added a commit that referenced this pull request Mar 8, 2026
…resume by name

- Schema v4: unique title index, migration from v2/v3
- set/get/resolve session titles with uniqueness enforcement
- Auto-lineage: context compression auto-numbers titles (Task -> Task #2 -> Task #3)
- resolve_session_by_title: auto-latest finds most recent continuation
- list_sessions_rich: preview (first 60 chars) + last_active timestamp
- CLI: -c accepts optional name arg (hermes -c 'my project')
- CLI: /title command with deferred mode (set before session exists)
- CLI: sessions list shows Title, Preview, Last Active, ID
- 27 new tests (1844 total passing)
0xMikey-ooze pushed a commit to 0xMikey-ooze/hermes-agent that referenced this pull request Mar 16, 2026
Bug #1: Add module-level _dashboard_port and _early_port resolved from
$PORT env var (Railway dynamic ports) with fallback to $DASHBOARD_PORT
then 3001. Prevents OSError port 8080 already in use.

Bug #2: Add TelegramPlatform alias for TelegramAdapter and property
setters on BasePlatformAdapter for test compatibility. The conflict
detection (_looks_like_polling_conflict) and handler
(_handle_polling_conflict) already existed.

Bug NousResearch#3: tirith_security.ensure_installed() already handles all failure
modes gracefully (cosign missing, download failed, unsupported platform).
No code changes needed — all 15 tests pass.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
0xMikey-ooze pushed a commit to 0xMikey-ooze/hermes-agent that referenced this pull request Mar 16, 2026
Bug #1: PORT env var — expose _dashboard_port and _early_port as module-level
variables in gateway/run.py so Railway's dynamic $PORT is resolved at import
time and re-resolved at runtime. No more hardcoded 8080.

Bug #2: Telegram 409 conflict — add TelegramPlatform alias for TelegramAdapter,
and make BasePlatformAdapter properties (name, has_fatal_error, fatal_error_code)
settable so conflict handler tests can construct instances without __init__.

Bug NousResearch#3: Tirith binary — already handled gracefully (background thread, 24h marker,
cosign optional). No source changes needed; tests confirm behavior.

All 37 RED-phase tests now pass.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
0xMikey-ooze pushed a commit to 0xMikey-ooze/hermes-agent that referenced this pull request Mar 16, 2026
Bug #1: Add module-level _dashboard_port and _early_port to gateway/run.py.
Reads $PORT (Railway), falls back to $DASHBOARD_PORT, defaults to 3001.
Both variables share the same value to prevent port bind conflicts.

Bug #2: Fix Telegram connect() Application lookup to be monkeypatch-safe
by using dynamic module attribute resolution via sys.modules[__name__].

Bug NousResearch#3: Tirith graceful failure was already correctly implemented — no
changes needed, all 15 tests passed out of the box.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
teknium1 added a commit that referenced this pull request Mar 17, 2026
1. browser_tool.py: Replace **args spread on browser_click, browser_type,
   and browser_scroll handlers with explicit parameter extraction. The
   **args pattern passed all dict keys as keyword arguments, causing
   TypeError if the LLM sent unexpected parameters. Now extracts only
   the expected params (ref, text, direction) with safe defaults.

2. fuzzy_match.py: Update module docstring to match actual strategy
   order in code. Block anchor was listed as #3 but is actually #7.
   Multi-occurrence is not a separate strategy but a flag. Updated
   count from 9 to 8.
alt-glitch added a commit that referenced this pull request Apr 2, 2026
- Add `shopt -s expand_aliases` to snapshot so aliases captured by
  `alias -p` actually work under `bash -c` (review comment #2)
- Pass threshold=0 in enforce_turn_budget() so L3 can force-persist
  results below the 50K default when aggregate budget is exceeded
  (review comment #3)
- Add regression test: 6x42K results (each under 50K) exceeding 200K
  budget are now correctly persisted
pnxxwzh pushed a commit to pnxxwzh/hermes-agent that referenced this pull request Apr 3, 2026
Issue NousResearch#1 — default_inject=0 节点不应参与 PPR 排名
- recaller.py: 新增 _is_injectable() 辅助函数,在 _merge_hit()
  入口处过滤 default_inject=0 节点(reflection/shadow 来源)
- 这些节点现在直接丢弃,不再进入候选集、不影响 PPR 排名、
  不出现在召回结果中

Issue NousResearch#2 — 同类型去重命中时 detail 字段未更新
- store.py: update_node_scoring() 新增 detail 参数
- sparkgraph_tool.py: 三处写入路径(新建/同型去重/竞态恢复)
  均补上 detail=evidence,保证证据不丢失

Issue NousResearch#3 — merge_nodes 后 PPR 图缓存未失效
- store.py: merge_nodes() 末尾调用 invalidate_graph_cache()
- 修复三条边迁移冲突的 DELETE 逻辑(Case 3: keep→merge 自环)
- tests/sparkgraph/test_recall_filters.py: 新增 7 个测试
- tests/sparkgraph/test_merge_cache_invalidation.py: 新增 3 个测试
- tests/tools/test_sparkgraph_record_tool.py: 新增 2 个 detail 相关测试
leepoweii added a commit to leepoweii/hermes-agent that referenced this pull request Apr 26, 2026
…LLM error state

NousResearch#1 (CRITICAL): Add Platform.LINE to GatewayRunner._is_user_authorized's
platform_env_map and platform_allow_all_map (both occurrences). Without
this, every LINE message failed authorization regardless of allowlist
configuration. Adds tests/gateway/platforms/line/test_runner_integration.py
as a regression guard.

NousResearch#2 (CRITICAL): Setup wizard and docs claimed "leave empty for open access"
but the adapter denies all when allowlists are empty. Fixed wizard help
text and docs/messaging/line.md to state empty=deny. Added LINE_ALLOW_ALL_USERS
env-var escape hatch in allowlist.is_allowed() (mirrors DISCORD_ALLOW_ALL_USERS
pattern) for debug-only "allow all".

NousResearch#4 (HIGH): LLM exceptions left cache entry in PENDING forever, so users saw
infinite "thinking" button. Added State.ERROR, RequestCache.set_error(),
prune ERROR alongside READY/DELIVERED. Adapter now transitions on LLM
exception and both watcher (within timeout) and postback handler deliver
the error message and mark delivered.

Tests: 64 passed (was 58), 1 skipped. New tests: cache set_error +
prune-error, adapter llm-exception delivery + postback ERROR branch,
runner LINE auth-map regression.

Deferred (per spec): NousResearch#3 pending message drain — accepted bypass-architecture
trade-off.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
leepoweii added a commit to leepoweii/hermes-agent that referenced this pull request Apr 27, 2026
…LLM error state

platform_env_map and platform_allow_all_map (both occurrences). Without
this, every LINE message failed authorization regardless of allowlist
configuration. Adds tests/gateway/platforms/line/test_runner_integration.py
as a regression guard.

but the adapter denies all when allowlists are empty. Fixed wizard help
text and docs/messaging/line.md to state empty=deny. Added LINE_ALLOW_ALL_USERS
env-var escape hatch in allowlist.is_allowed() (mirrors DISCORD_ALLOW_ALL_USERS
pattern) for debug-only "allow all".

infinite "thinking" button. Added State.ERROR, RequestCache.set_error(),
prune ERROR alongside READY/DELIVERED. Adapter now transitions on LLM
exception and both watcher (within timeout) and postback handler deliver
the error message and mark delivered.

Tests: 64 passed (was 58), 1 skipped. New tests: cache set_error +
prune-error, adapter llm-exception delivery + postback ERROR branch,
runner LINE auth-map regression.

Deferred (per spec): NousResearch#3 pending message drain — accepted bypass-architecture
trade-off.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…resume by name

- Schema v4: unique title index, migration from v2/v3
- set/get/resolve session titles with uniqueness enforcement
- Auto-lineage: context compression auto-numbers titles (Task -> Task NousResearch#2 -> Task NousResearch#3)
- resolve_session_by_title: auto-latest finds most recent continuation
- list_sessions_rich: preview (first 60 chars) + last_active timestamp
- CLI: -c accepts optional name arg (hermes -c 'my project')
- CLI: /title command with deferred mode (set before session exists)
- CLI: sessions list shows Title, Preview, Last Active, ID
- 27 new tests (1844 total passing)
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
1. browser_tool.py: Replace **args spread on browser_click, browser_type,
   and browser_scroll handlers with explicit parameter extraction. The
   **args pattern passed all dict keys as keyword arguments, causing
   TypeError if the LLM sent unexpected parameters. Now extracts only
   the expected params (ref, text, direction) with safe defaults.

2. fuzzy_match.py: Update module docstring to match actual strategy
   order in code. Block anchor was listed as NousResearch#3 but is actually NousResearch#7.
   Multi-occurrence is not a separate strategy but a flag. Updated
   count from 9 to 8.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
- Add `shopt -s expand_aliases` to snapshot so aliases captured by
  `alias -p` actually work under `bash -c` (review comment NousResearch#2)
- Pass threshold=0 in enforce_turn_budget() so L3 can force-persist
  results below the 50K default when aggregate budget is exceeded
  (review comment NousResearch#3)
- Add regression test: 6x42K results (each under 50K) exceeding 200K
  budget are now correctly persisted
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
… pool

Add structured INFO-level logging to the key code paths so agent.log
captures actionable debugging data:

API calls (run_agent.py):
  - Model, provider, input/output tokens, total tokens, latency
  - Cache hit rate (cache_read_tokens / prompt_tokens percentage)
  - Logged after each successful API call with usage data

Tool execution (run_agent.py):
  - Tool name, duration, result size for successful calls
  - Tool name, duration, error preview for failures
  - Both sequential and concurrent execution paths instrumented

Session lifecycle (run_agent.py):
  - Conversation turn start: session ID, model, provider, platform,
    history size, message preview
  - Context compression: before (message count, token estimate) and
    after (compressed count, post-compression tokens)

Credential pool (agent/credential_pool.py):
  - Pool exhaustion: which credential was marked exhausted and why
  - Rotation: which credential was selected next
  - Empty pool: when all credentials are exhausted

Example agent.log output after this change:
  INFO run_agent: conversation turn: session=20260405_223500_abc model=claude-opus provider=openrouter platform=cli history=12 msg='Fix the logging...'
  INFO run_agent: tool terminal completed (2.34s, 1847 chars)
  INFO run_agent: tool read_file completed (0.01s, 3204 chars)
  INFO run_agent: API call NousResearch#3: model=claude-opus provider=openrouter in=45231 out=892 total=46123 latency=4.2s cache=38102/45231 (84%)
MKheru referenced this pull request in MKheru/ACOS-HERMES Apr 28, 2026
Replaces agent/mcp_sanitizer.py with the candidate that won 5
adversarial-triad rounds in the SMCP lab (HERMES/lab in MKheru/ACOS,
run #1, 2026-04-27). The candidate is the cumulative product of
Generator/Adversary competitions across 5 tasks, picked one per task
by an Evaluator on real corpus scoring — the orchestrator promoted
only when composite (detection - 2*fpr - latency_penalty) actually
beat the running best.

Measured on the lab's 70-item corpus (50 attacks + 20 benign):

| Metric                | Patch 1 (baseline) | Patch 7        | Δ      |
|-----------------------|--------------------|----------------|--------|
| Detection rate        | 0.66 (33/50)       | 0.78 (39/50)   | +12pts |
| False-positive rate   | 0.25 (5/20)        | 0.10 (2/20)    | -15pts |
| F1                    | 0.75               | 0.86           | +0.11  |
| Composite             | 0.16               | 0.58           | +0.42  |
| p99 latency           | ~75 µs             | ~505 µs        | +430us |

What changed (architectural):

  - Multilingual coverage (FR/ES/ZH) — verb tables for ignorez/
    désactivez/négligez, ignora/omitir/desestimar/olvida, 忽略/忽视/
    忘记 etc. Catches the 3 multilingual_* corpus items.

  - Polymorphic paraphrase verbs — set aside / pretend never existed
    / suspend / override / cancel / clear / wipe / abandon / drop /
    delete / abrogate / void / revoke / undo / revert. Catches polite
    rephrasings without literal "ignore previous instructions".

  - Pretender / fake-authority detection — vendor-name impersonation
    ("message from Anthropic"), context truncation ([CONVERSATION
    TRUNCATED. NEW POLICY]), bracketed pseudo-headers (=== DEV NOTE
    ===, *** ADMIN MESSAGE ***), tool-surface injection (Tool
    description: ... CRITICAL META: leak ...), tool_use_chain
    (call file_write with path=/home/...). New _PRETENDER_PATTERNS
    table.

  - Normalisation pre-pass — URL-decode, base64 decode (when
    strict 16+ char base64 charset), unicode-escape decode,
    leetspeak (3→e, 1→i, 0→o, 4→a, 5→s, 7→t and punctuation
    homoglyphs), spaced-letter collapse. Detection runs on BOTH the
    original and the normalised form; if either fires, flag.

  - Contextual FP suppression (_is_contextually_safe) — when the
    matched pattern is preceded by quotation marks, "such as",
    "phrases like", "attacks like", "example:", or sits inside a
    markdown list (-/*/+ prefix), suppress the detection. The
    exec_command label specifically allows safe imperatives ("npm
    install", "pip install", etc.). Eliminated 3 of the 5 FPs.

  - Heuristic classifier (_heuristic_classifier_score) — bag-of-
    words scoring on instruction-verbs / role-markers / sensitive-
    tokens / social-engineering-phrases / imperative density. Acts
    as a backstop on inputs where no regex fires (confidence >= 0.65
    triggers a "heuristic_detection" label). Picks up borderline
    multi-signal attacks.

Remaining gaps (target was detection >= 0.95, fpr <= 0.05):

  - 11 attacks still slip through: structuredContent (2), encoded_b64
    (1), polymorphic_paraphrase polite (1), tool_description (1),
    tool_arg_injection (1), spaced (1), yaml_role (1), exfil_request
    (1), context_truncation paraphrase (1), social_engineer (1).
  - 2 FPs still fire: ben-012 (support ticket using "system prompt"
    as a UI feature term), ben-016 (permissions doc with [admin]:
    role list).

Run #2 (2026-04-28) confirmed regex hit a ceiling — 18 rounds
produced 0 promotions, suggesting these residuals require structural
detection (parsing, embeddings, contextual models) rather than more
patterns. Run #3 in HERMES/lab/tasks_run3.json proposes 6
architectural tasks to break through.

Backward compatibility: same public signature (sanitize_mcp_output,
sanitize_mcp_structured) and same wrapping envelope
(<UNTRUSTED_MCP_OUTPUT ... > on detection, original text on
benign). Existing tests in tests/agent/test_mcp_sanitizer.py
should mostly still pass — those that don't will be updated in a
follow-up commit (the regex labels and order changed slightly).

Refs: HERMES/LAB_MCP_SECURITY.md, HERMES/lab/round_history.jsonl
in MKheru/ACOS.
leepoweii added a commit to leepoweii/hermes-agent that referenced this pull request Apr 28, 2026
…elivery + Push API doc

- Add "line" to _KNOWN_DELIVERY_PLATFORMS so bare deliver='line' isn't
  silently dropped before reaching the platform_map lookup
- Add "line": "LINE_HOME_CHANNEL" to _HOME_TARGET_ENV_VARS so the env var
  written by hermes setup is reachable for cron home-channel delivery
- Add two regression tests: test_line_in_cron_known_delivery_platforms and
  test_line_in_cron_home_target_env_vars
- Correct line.md: standard replies use Reply API (free); Push API is used
  for image sends and tool-initiated send_message calls

177 tests passing (82 LINE + 95 cron).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
leepoweii added a commit to leepoweii/hermes-agent that referenced this pull request Apr 28, 2026
…bbing

Sonnet/Opus final review polish (all non-blocking):

Docs (Sonnet WARNING + Opus NousResearch#5 doc note):
- line.md setup section: 'CSV' -> 'comma-separated' to match wizard prompts
- Add doc note that mention-strip is skipped in free-response groups
  (mention text reaches LLM verbatim — LLMs handle it gracefully but
  worth knowing)

Docs (Opus NousResearch#1 — silent-drop troubleshooting flowchart):
- Add 'Why isn't my bot responding?' decision tree to troubleshooting
  section covering all 9 silent-drop paths in dispatch order
- Add 'Bot never replies in a free-response group' row to symptom table

Code (Opus NousResearch#4 — defensive token scrubbing):
- Add _scrub_token() helper that strips 'Bearer <token>' from any string
- Apply to send_image and _push_text exception messages (both log and
  returned SendResult.error)
- Test for scrub helper

Deferred to follow-up PRs (Opus WARNINGs NousResearch#2 and NousResearch#3, only matter at scale):
- Per-request httpx.AsyncClient instantiation -> persistent client
- RequestCache size cap + bucketed prune

110 LINE tests passing.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
HanShaoshuai-k added a commit to HanShaoshuai-k/hermes-agent that referenced this pull request Apr 29, 2026
…nes (Phases 0-8 + 2 hotfixes)

Replaces Hermes' hand-written Feishu adapter pipelines with the official
``lark_oapi.channel.FeishuChannel`` SDK. The adapter retains its public
surface (BasePlatformAdapter signatures, FEISHU_* env vars, MessageEvent
shape), while delegating the SDK-side capabilities — markdown rendering,
inbound normalization, safety pipeline (dedup / stale / policy / mention /
per-chat lock / text + media batching), bot identity hydration, sub-event
dispatch, and webhook signature/decryption — to the SDK. Hermes keeps a
thin shim layer for SDK ↔ Hermes type translation, plus the parts SDK
doesn't own (drive comment LLM agent, QR onboarding, deployment-side
webhook server with rate-limit + anomaly tracker, persistent dedup store).

The work was driven by a phased plan (specs at
``docs/superpowers/specs/2026-04-27-feishu-channel-sdk-*``) and executed
across 9 conceptual phases plus 2 real-environment hotfixes. Squashed to
keep merge history linear.

- ``gateway/platforms/feishu/`` package — split out from the previous
  monolithic ``feishu.py`` (4629 lines):
  - ``adapter.py`` (1884) — ``FeishuAdapter`` class, 11 ``_on_sdk_*``
    handlers, 11 ``send_*`` thin wrappers, lifecycle (connect / disconnect
    / _stop_webhook_server), settings projection (``_build_sdk_policy_config``,
    ``_build_sdk_safety_config``)
  - ``events_mapping.py`` (514) — ``to_message_event`` SDK
    ``InboundMessage`` → Hermes ``MessageEvent``; sub-event adapters
    (``_to_command_event_from_card_action``, ``_to_text_event_from_reaction``,
    ``_sdk_comment_to_legacy_dict``); 19-kind content type map
  - ``dedup_store.py`` (186) — ``JsonFileDedupStore`` implementing SDK
    ``DedupStore`` Protocol with backward-compat for Phase 2 transitional
    and pre-Phase-2 plain-list formats; atomic os.replace, LRU eviction,
    debounced flush
  - ``webhook_guard.py`` (412) — ``start_webhook_server`` (aiohttp
    runner) + ``_RateLimiter`` (sliding window 120/60s with 4096-key LRU
    cap) + ``_AnomalyTracker`` (25-error threshold over 6h TTL) + dataclasses
    ``RateLimit`` / ``WebhookAnomaly``
  - ``approvals.py`` (199), ``qr_register.py`` (382), ``comments.py`` (1383)
- Backward-compat shim: ``gateway/platforms/feishu_webhook_guard.py`` +
  ``feishu_comment.py`` re-export to support stable external imports
  (``tools/send_message_tool.py``, etc.)

- 26-function normalize chain (``normalize_feishu_message`` + ``_normalize_*``
  family + ``parse_feishu_post_payload`` + render/collect helpers)
- Resource extract / download chain (7 functions)
- Identity / policy / mention / self-sent gate (9 functions)
- Text + media batching state machines (16 functions + 6 instance fields)
- Per-chat processing lock + chat queue
- Legacy WS lifecycle (``_run_official_feishu_ws_client``,
  ``_apply_runtime_ws_overrides``, ``_connect_websocket``,
  ``_hydrate_bot_identity``)
- ``_build_event_handler`` + 12 legacy ``_on_*_event`` handlers
- 4 inline webhook guards (``_is_webhook_signature_valid``,
  ``_check_webhook_rate_limit``, ``_record_webhook_anomaly``,
  ``_clear_webhook_anomaly``) — moved to ``webhook_guard.py``
- 5 dataclasses (``_FeishuBotIdentity``, ``FeishuPostMediaRef``,
  ``FeishuPostParseResult``, ``FeishuNormalizedMessage``, ``FeishuBatchState``)
- Markdown rendering chain (8 helpers — replaced by SDK
  ``MarkdownConverter(tag_md_mode='native')``)
- 22 module-level constants newly redundant after SDK takeover
- 7 ``FeishuAdapterSettings`` fields whose env vars SDK now owns;
  graceful-ignore preserved per spec §A.1 invariant NousResearch#4

- New mandatory regression gate at ``tests/gateway/feishu/`` — 71 tests
  built from a contract + golden + dedup-unit + media-caption + approval
  + webhook-security suite. Each ``_on_sdk_*`` handler, every SDK reject
  reason literal, and every send_* path has at least one fixture-driven
  contract test. Conftest mocks SDK calls and synthesizes
  ``InboundMessage`` payloads from legacy event JSON, so tests run
  hermetically.
- ``tests/gateway/test_feishu.py`` reduced from 4629 → 471 lines
  (-3700+); 17 legacy test classes deleted whose subjects were SDK-
  replaced. Each deletion verified to have a contract / golden equivalent
  per spec §10 NousResearch#2.
- ``test_feishu_approval_buttons.py`` removed (coverage migrated to
  ``test_approval_flow.py``).
- ``test_text_batching.py``'s ``TestFeishuAdaptiveDelay`` retired
  (Hermes-side text batching → SDK ``InboundConfig.text_batch_*``).

Real-environment ``hermes gateway restart`` surfaced 4 bugs missed by all
71 unit tests because none drives ``channel.connect()`` against the real
SDK. Fixed in this commit:

1. ``NameError: _FEISHU_SEND_ATTEMPTS`` — Phase 8 deleted the constant
   but ``RetryConfig(max_attempts=...)`` still referenced it. Inlined ``3``.
2. ``domain must use https scheme (got 'feishu')`` — Hermes settings
   stores the short name; SDK requires fully-qualified URL. Map
   ``feishu``/``lark`` → ``FEISHU_DOMAIN`` / ``LARK_DOMAIN`` URL
   constants.
3. ``RuntimeError: This event loop is already running`` — SDK
   ``lark_oapi/ws/client.py:28-30`` captures
   ``asyncio.get_event_loop()`` at module import; later when our running
   asyncio app is alive and SDK ``channel.connect()`` pushes
   ``channel.start()`` to a thread-pool executor, the thread calls
   ``loop.run_until_complete()`` on the still-running main loop. Swap
   ``lark_oapi.ws.client.loop`` with a fresh, never-set-as-current loop
   in ``connect()`` before invoking SDK. **SDK-side bug; Hermes
   workaround is idempotent — file upstream as CR.**
4. ``channel.start()`` blocks forever in ``_select`` (``while True:
   await sleep(3600)``) — so ``_mark_ready()`` is unreachable and
   ``wait_ready()`` always times out. Don't ``await
   channel.connect()``; instead ``run_in_executor(None, channel.start)``
   fire-and-forget and probe ``channel._ws_client._conn`` for actual
   ready signal (60×0.5s = 30s timeout). **SDK-side design defect;
   Hermes workaround tracked on ``self._sdk_start_future`` for
   disconnect observability — file upstream as CR.**

A 5th bug surfaced after the 4 SDK fixes: every group except the
explicitly-listed home channel was rejected by SDK SafetyPipeline with
``policy_group_not_in_allowlist``. Root cause: Phase 2 Task 2 mistakenly
projected Hermes' per-USER ``allowed_group_users`` allowlist into SDK's
per-CHAT ``PolicyConfig.group_allowlist`` field. Additionally,
``FEISHU_ALLOW_ALL_USERS=true`` (a Hermes top-level user-auth bypass)
was never propagated to SDK so SDK could still pre-reject what Hermes'
upper layer would have authorized.

Fix in ``_build_sdk_policy_config``:
- ``FEISHU_ALLOW_ALL_USERS=true`` → SDK ``group_policy=open``,
  ``allow_from=None`` (full bypass).
- ``FEISHU_GROUP_POLICY=allowlist`` → SDK ``group_policy=open`` +
  ``allow_from=allowed_group_users`` (per-user gate via SDK's
  documented per-user field, not per-chat field — exactly mirrors
  legacy ``_allow_group_message`` semantics).
- Other modes (open / blocklist / admin_only / disabled) project
  unchanged.
- ``group_rules`` per-chat overrides unchanged (Hermes per-rule
  allowlist IS per-user; matches SDK ``GroupOverride.allowlist``
  semantics correctly).

- **Mandatory regression gate** (``tests/gateway/feishu/``): 71 PASS.
- **Combined feishu test suites**: 104 PASS, 0 regression.
- **Whole gateway suite**: 3766 PASS, 9-10 unrelated pre-existing flakes
  (matrix encrypted-room / whatsapp bridge / approval e2e / split-brain
  cancellation / slack DM / gateway shutdown — confirmed pre-existing
  via ``git stash`` baseline check; not feishu-related).
- **Real-env smoke** (``hermes gateway restart`` against staging Feishu app):
  - Bot identity resolved from SDK ``fetch_bot_identity``;
    ``connected to wss://msg-frontier.feishu.cn/...``;
    ``Gateway running with 3 platform(s)``.
  - End-to-end: inbound text → ``_on_sdk_message`` → ``to_message_event``
    → ``handle_message`` → agent → ``channel.send`` outbound, multiple
    successful round-trips (4-19 second turn-around).
  - SDK SafetyPipeline ``_on_sdk_reject`` correctly bridged to Hermes
    metrics on ``policy_group_not_in_allowlist`` rejections.
  - ``JsonFileDedupStore`` cross-restart persistence confirmed
    (``~/.hermes/feishu_seen_message_ids.json`` count grew across
    inbound flow).

``docs/superpowers/notes/feishu-channel-sdk-execution-questions.md``)

- File 2 SDK CRs upstream for the workarounds in Phase 8.1 NousResearch#3 and NousResearch#4
  (module-level loop capture; unreachable ``_mark_ready``). Hermes
  workarounds become no-ops once SDK fixes land.
- Spec §A.4 list of ``ws_reconnect_*`` ``TransportConfig`` fields
  conflicts with actual SDK shape (server-authoritative ``ClientConfig``
  doesn't expose these). Phase 2 + Phase 8 align with actual SDK; spec
  needs maintenance pass to match.
- Spec §10 NousResearch#1 line-count target ≤2900 unmet (5059 today). Root cause
  documented: SDK Channel scope is IM-message layer, doesn't subsume
  Hermes' drive-comment LLM agent (1383 lines), QR onboarding (382),
  deployment-side webhook server (412), or required SDK ↔ Hermes glue
  (~1000). Squashing to ≤2900 requires either spec revision or out-of-
  scope refactors (extracting drive-comment LLM agent to ``tools/``,
  asking SDK team to subsume Drive v2 evaluation API).
- Spec §10 NousResearch#4 12-item staging manual smoke partially done; full coverage
  requires staging access for media uploads / QR onboarding / WS
  reconnect / approval card flow / drive-comment trigger.
- Add a contract test for per-user vs per-chat allowlist semantics
  (Phase 8.2 root cause): 2 chat_ids, 3 senders, ``group_policy=allowlist``
  + ``FEISHU_ALLOWED_USERS={A,B}`` — would have caught the bug at PR
  time.

- 78 files changed: +23002 insertions, -10759 deletions.
- Net production code delta: ``feishu.py 4629 + feishu_comment.py 1383
  = 6012`` → 5059 lines (-953, -15.9%).

Co-developed via ``superpowers:subagent-driven-development`` skill, with
per-phase commit history preserved on branch
``feishu-channel-sdk-backup`` for archaeology.

Change-Id: I321f10b2ca4eae14adf7b2d11f0ccb81613bfe05
leepoweii added a commit to leepoweii/hermes-agent that referenced this pull request Apr 29, 2026
Apply opus-pass NousResearch#3 conformance findings:

- BLOCKER: fix two broken annotations from the prior typing-form
  conversion pass — ``web.Optional[AppRunner]`` and
  ``asyncio.Optional[Event]`` were string-substitution artifacts that
  only worked because of ``from __future__ import annotations``;
  ``get_type_hints()`` would have crashed at runtime. Corrected to
  ``Optional["web.AppRunner"]`` and ``Optional[asyncio.Event]``.
- MAJOR: ``_keep_typing`` override now wraps its body in try/finally
  and clears ``self._typing_paused.discard(chat_id)`` to mirror the
  base class cleanup contract (base.py:1791-1800). Without this the
  pause set leaked across runs.
- MAJOR: route every LINE setting through ``config.extra.get(key)``
  first, then env, then default — matches bluebubbles/signal/
  mattermost/dingtalk pattern and honors the v2 PlatformConfig
  contract for YAML-driven config.
- MINOR: hoist ``strip_markdown`` import to module top (peer
  convention; was lazily imported inside two methods).
- MINOR: refactor test_line_send_routing.py to use ``monkeypatch``
  instead of direct ``os.environ`` mutation — auto-restores env
  state, safe for xdist parallelism.

All 88 LINE tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
briandevans added a commit to briandevans/hermes-agent that referenced this pull request Apr 30, 2026
…ew (follow-up to NousResearch#12260)

Addresses all 5 Copilot review comments on NousResearch#12284:

1. **``~//foo`` edge case** (inline NousResearch#1) — ``os.path.join(home, "/foo")``
   used to return ``/foo`` (absolute), bypassing the subprocess HOME.
   ``_expanduser_for_subprocess`` now strips *all* leading path
   separators from the post-``~`` remainder before joining, mirroring
   :func:`os.path.expanduser`'s own behaviour for ``~//foo``.
2. **Windows ``~\\wiki``** (inline NousResearch#2) — the previous ``~/`` prefix
   check only matched forward slashes, so Windows-style paths fell
   through to ``os.path.expanduser`` and expanded against the Python
   process HOME.  Added a ``_TILDE_PREFIXES`` module constant that
   matches both ``~/`` and ``~\\`` and runs the same subprocess-HOME
   expansion.
3. **Comment wording** (inline NousResearch#3) — removed the misleading
   "Explicit posix join" comment; the docstring now accurately
   describes that leading separators get stripped and the join uses
   the platform-native ``os.path.join``.
4. **Unused import** (inline NousResearch#4) — dropped the stray
   ``SKILL_CONFIG_PREFIX`` import from the test module.
5. **Repeated stat calls** (inline NousResearch#5) — ``resolve_skill_config_values``
   now resolves ``get_subprocess_home()`` **once** at the top of the
   function and threads the result through the helper via a new
   ``subprocess_home=`` kwarg.  Multiple config vars no longer trigger
   repeated ``os.path.isdir`` calls.

New tests pinning the hardened behaviour:
* ``test_double_slash_after_tilde_stays_under_subprocess_home``
  (``~//foo/bar`` → ``/opt/data/home/foo/bar``)
* ``test_tilde_backslash_on_windows_style_paths``
  (``~\\wiki`` expands under subprocess HOME)
* ``test_mixed_separators_after_tilde``
  (``~\\\\foo`` stays under subprocess HOME)
* ``test_subprocess_home_argument_short_circuits_lookup``
  (passing ``subprocess_home=`` skips ``get_subprocess_home`` — mocked
  with ``AssertionError`` side effect to prove it isn't called)

Validation
----------
``source venv/bin/activate && python -m pytest
tests/agent/test_skill_config_subprocess_home.py -q`` →
**17 passed** (13 original + 4 new edge-case).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
MKheru referenced this pull request in MKheru/ACOS-HERMES Apr 30, 2026
Closes the 11 attacks the prod sanitizer was missing on attack_corpus.jsonl.
Fixes 5 bugs in the existing flow + adds 11 new regex patterns + ports the
Run #3 Task 2 winner (structured-payload parsing).

Bugs fixed
----------
1. _normalize_text early-returned after URL decode - b64/unicode/leet
   never reached. Now chains transforms sequentially.
2. _normalize_spaced was a no-op for letter-spaced text (split loses
   word boundaries). Replaced by regex matching runs of single chars
   separated by ONE space, preserving word boundaries via 2+ space gaps.
3. _is_contextually_safe used substring match for example - example.com
   in URLs wrongly suppressed att-037. Now requires explicit phrase
   markers (for example, such as, phrases like).
4. _is_contextually_safe had a missing-f f-string with literal
   {_PARAPHRASE_VERBS} - silent dead code.
5. _is_contextually_safe wrongly suppressed att-035 Step 1: pretend
   via step\d+: rule. Suppression rule deleted.

New code
--------
- _decode_b64_substrings: finds 16+ char b64 runs anywhere in text
  (not just whole-text), decodes them, appends decoded forms so the
  pattern matcher sees both surfaces. Catches att-032.
- _is_structured_payload + _extract_json_string_fields + _scan_structured
  port the Run #3 Task 2 winner: parse JSON and scan each string leaf
  field. Catches att-027 / att-028.
- _PATCH14_PATTERNS - 11 new regex covering: context_truncation variants
  (RESUMING WITH NEW POLICY, safety guidelines no longer apply, from
  this point on), exfil_request (send/post/upload to https://, dump
  env vars), social engineer (identity claim, one-time exception,
  debug-only), yaml/json role spoofing, CRITICAL META in tool
  descriptions, polymorphic pretend X never existed.

Metrics
-------
run_lab.py against /home/hermes/lab-smcp/attack_corpus.jsonl

              Patch 7   Patch 14
  detection   78%       100% (50/50)
  FPR         10%       10% (ben-012, ben-016 - contextual FP, separate task)
  F1          0.81      0.98
  latency p99 498us     618us

Regression note
---------------
167/168 tests pass. The failing test_hermes_md_blocks_injection was
already broken before this patch - it expects BLOCKED in the output
but Patch 11 (G4 policy sanctuarization) changed the contract to keep
policy files intact + log critical. Test needs updating to match the
post-Patch 11 contract; not introduced by this patch.
leepoweii added a commit to leepoweii/hermes-agent that referenced this pull request Apr 30, 2026
… 400 cap

- MAJOR: LINE_ALLOW_ALL_USERS now honors PlatformConfig.extra
  (was env-only, the lone holdout). Resolved once in __init__ to
  self._allow_all_sources; is_allowed() helper stays pure and the
  bypass is short-circuited at the dispatch call site.
- MAJOR: Template Buttons altText capped at 400 chars (LINE limit).
  Operators with a long pending_reply_text would have hit a 400
  from the API on every slow-LLM button POST.
- MINOR: _handle_postback ERROR branch now uses _build_reply_messages
  (matches READY branch) so any future ERROR text with markdown or
  image URLs renders consistently. _interrupted_text is plain text
  today so no behavior change yet, but the symmetry pre-empts a
  future inconsistency.
- MINOR: bot_display_name init wraps the extra/env value in str()
  before .strip() — defensive against non-string YAML values.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…lls/

- Rename skill to touchdesigner-mcp (matches blender-mcp convention)
- Move from skills/creative/ to optional-skills/creative/
- Fix duplicate pitfall numbering (NousResearch#3 appeared twice)
- Update SKILL.md cross-references for renumbered pitfalls
- Update setup.sh path for new directory location
YonggangZhang9412 pushed a commit to YonggangZhang9412/hermes-agent that referenced this pull request May 1, 2026
…cap wiring, admin CLI

Addresses pre-Phase-C debts called out in review and starts Phase C in
the priority order agreed: Docker → loadtest → cap.update wiring →
Admin CLI.

Pre-Phase-C debts
-----------------
* _jwt.py audited against 4 criteria:
  CRIT-1 hmac.compare_digest         line 92
  CRIT-2 base64url padding           line 47
  CRIT-3 alg whitelist + alg=none    lines 56, 75, 84-85 (3-layer defence)
  CRIT-4 exp/nbf + clock-skew leeway lines 100-114
  Added nbf check + leeway parameter; added 24 security regression
  tests including alg=none with empty sig, alg=RS256 confusion,
  every base64 residue class, exp boundary, nbf in future, leeway
  grace, malformed exp, non-dict header/payload.
* KNOWN_GAPS.md lists every failure mode flagged in review:
  JWT mid-WS expiry (v1.5), broker restart timing (v1 accept),
  network drop vs clean close presence drift (v1.5),
  mutual invite race (v1 accept), duplicate step completion
  (v1 accept; safe), concurrent cap.update (v1 accept),
  FTS5 Chinese tokenization (v1.5 MUST FIX, jieba),
  body size + rate-bucket boundaries (v1 accept; tests TBD),
  3-decline auto-block (v1.5).
* on_session_end hook unregistered (was a no-op; lying about
  registration removed). Provides_hooks updated.
* pre_tool_call decision: NEVER REGISTER in v1; broker enforces
  ACL + rate limits already, hook would be duplicative. v1.5
  re-evaluate only if a real broker-side-can't-express policy
  appears.
* Hook callback body audit (in commit message; full table in
  CLAUDE-conversation): each registered callback documents
  exactly which kwargs of its callsite it reads (mostly none).

Phase C NousResearch#1 — Docker + Caddy
---------------------------
* Multi-stage Dockerfile (builder + slim runtime), runs as uid 10001,
  tini PID 1, healthcheck on /health, /data volume for SQLite WAL.
* docker-compose.yml with broker + Caddy reverse proxy.
* Caddyfile: auto-TLS via Let's Encrypt, /docs blocked, WS upgrade
  on /v1, plain HTTP on auth/devices/health.
* .env.example template with required JWT secret + domain.
* .dockerignore.

Phase C NousResearch#2 — Load test scaffold
-------------------------------
* loadtest/_harness.py: SyntheticClient (signup + WS connect via
  websockets), broker probe (health + /proc RSS + /proc fd count),
  linear_slope helper for trend detection.
* loadtest/soak.py: full soak test runner with PASS/FAIL budgets
  (RSS slope < 1MB/min, FD slope <= 0, no health failures).
* loadtest/test_smoke.py: 3 CI-runnable smoke tests verifying the
  harness itself doesn't regress.

Phase C NousResearch#3 — cap.update wiring
------------------------------
* Runtime.update_card / get_card with persistent KV-backed card.
* Auto-push on first WS online transition (so new accounts are
  searchable immediately by handle as fallback display_name).
* Dedupe via SHA1 of card payload to avoid republishing on
  reconnect when nothing changed.
* /collab card show|set k=v slash command.
* collab_card_show + collab_card_update LLM tools.
* Two new E2E tests (real broker round-trip):
  - default-card auto-push makes new user findable
  - update_card immediately reindexes broker FTS for role/skill/bio

Phase C NousResearch#4 — Admin CLI
----------------------
* python -m collab_broker.admin: operator-side tool that talks
  directly to the SQLite (no auth surface added to broker).
* Subcommands:
  accounts list/show/suspend/reactivate/delete (with --yes guard)
  devices list/revoke
  audit (filter by account/action/since)
  codes resend (mints fresh ticket+code for stuck users)
  health (PRAGMA integrity_check + counts)
* Suspend revokes all of the account's refresh tokens atomically.
* All admin actions write audit_log rows tagged 'admin.<verb>'.
* 17 admin tests pass.

Test counts
-----------
  115 / 115 passing:
    Hermes side  : 32 plugin unit + 10 E2E (was 8; +2 cap wiring)
    Broker side  : 24 jwt (+17 from before) + 15 store + 14 auth +
                   17 admin + 4 loadtest smoke

Remaining for full Phase C (deferred to operator/calendar work):
* TLS cert provisioning + DNS pointing (Caddy handles automation,
  but the cert request needs the real domain + open 80/443)
* SMTP DKIM/SPF setup
* Monitoring / alerting wiring
* Backup automation + restore drill

https://claude.ai/code/session_014DGporWJ6L8hMgNL6jPcHP
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants