feat(agent): add simple observation masking extension by obviyus · Pull Request #386 · openclaw/openclaw

obviyus · 2026-01-07T12:00:43Z

Adds opt-in observation masking that replaces older tool results with a placeholder before sending context to the LLM. This reduces token usage in long-running sessions based on https://arxiv.org/abs/2508.21433.

Preceded by #381

This PR overlaps with #381 (context pruning). Key differences:

	This PR	#381
Approach	Count-based (keep last N)	Ratio-based (soft-trim → hard-clear)
Trigger	Always (if enabled)	When context ratio exceeds threshold
Partial preservation	No	Yes (keeps head/tail before clearing)

Trade-off: This is simpler and more predictable but less sophisticated. #381 is smarter about when and how to prune.

Config:

{
  agent: {
    observationMasking: {
      enabled: true,
      keepLast: 1,  // default
      placeholder: "Previous observation omitted for brevity."  // default
    }
  }
}

maxsumrall · 2026-01-07T20:50:13Z

I incorporated the ideas here, as far as I could see, by adding modes and adding a mode aggressive which sets the parameters to mask results similar to this approach.

The main difference also afaik is this PR counts last N tool calls while in #381 we attempt to count 'turns' or entire steps the agent takes. For example, if the user asks Q1 and agent thinks and performs 10 tool calls and answers, and user then asks Q2 + agent does 10 more tool calls, this PR will count the last N tools calls (of which there are now 20 in this example) while in #381 we count this as 2 agents turns, and would only prune/mask tool calls once they are more than the configure N turns old.

I think the advantage with approach in #381 is that you won't mask tool calls the agent does in the middle of a 'turn'. Although this might depend a lot of the use cases. I think for clawdbot this works fine. For i.e. codex cli, this might not work— as it can run for 30 minutes doing an entire project implementation in one 'turn'. We may want to refine this though if the current approach is indeed not wise.

(n.b. I think I am using the term 'turn' incorrectly here, will figure it out...)

#381 is merged in main, suggest we close this PR. Lets keep discussing!

steipete · 2026-01-07T21:07:15Z

Interesting. Wouldn't that trash the cash if we change history?

maxsumrall · 2026-01-07T23:43:07Z

Interesting. Wouldn't that trash the cash if we change history?

Yeah good question!

My understanding of prompt caching is that it’s exact prefix matching. In our case #381 is deterministically pruning (both in aggressive mode which is similar to this PR’s approach, and in the soft/hard adaptive approach). So once a given older tool result becomes trimmed/cleared, that content stays stable across later requests and should be cacheable.

There is still some churn as your “sliding trim boundary” advances. Example: if you have messages [1,2,3,4] and we preserve the N=2 most recent messages, we trim the older ones → [1t,2t,3,4]. Now when you append message 5, the boundary advances and 3 becomes newly eligible → [1t,2t,3t,4,5]. On that request, the first mismatch vs the previous prompt is inside message 3, so the cache could/should still hit for the prefix up to [1t,2t], and then recompute the rest.

So in this case it’s not a total cache miss — it’s a cache hit up to the point in the session right before the newly-pruned message, rather than a hit up to “everything except the new appended message”.

But my analysis also makes some assumptions about how the different LLM providers work. I’m not an expert in this topic (yet 😅) so the actual way the caches work might make this all moot.

I have some minor tweaks that might further improve caching based on thinking about this! Will circle back in a day or two. 👇 🤖 If someone else wants to try it earlier:

• Two small tweaks I want to try to make “aggressive” pruning more cache-friendly:

  1) Don’t prune inside the active user turn (tool loop)
     - keep bootstrap prefix: never touch anything before the first role:"user"
     - keep active turn stable: never touch anything after the most recent role:"user"
     - only prune toolResults in the “old history” slice:
       [firstUserIndex .. min(cutoffIndex(keepLastAssistants), lastUserIndex))

  2) Add a per-session pruning watermark (tiny state in the existing WeakMap runtime)
     - store watermarkIndex/messageId (“history pruned up to here”)
     - only advance it occasionally (e.g. once per user msg / every K turns / when prunableToolBulk exceeds threshold)
     - aggressive pruning then only masks toolResults older than watermark, so the “first changed token” doesn’t creep forward every request, but in batches every K turns.

…claw#386) * Web: add Obsidian connection setup * Web: align Obsidian REST API defaults

obviyus added 3 commits January 7, 2026 17:22

feat(agent): add observation masking extension

41bcfb3

feat(config): add observationMasking agent config options

b12004b

feat(agent): wire observation masking into embedded Pi runner

a64ef3c

obviyus force-pushed the exp/observation-masking branch from 0851c73 to a64ef3c Compare January 7, 2026 12:04

maxsumrall mentioned this pull request Jan 7, 2026

fix(agent): protect bootstrap prefix from context pruning #381

Closed

steipete self-assigned this Jan 7, 2026

obviyus closed this Jan 11, 2026

dgarson added a commit to dgarson/clawdbot that referenced this pull request Feb 9, 2026

Web: Add Obsidian connection UI (REST API, Direct, Node Bridge) (open…

6c29420

…claw#386) * Web: add Obsidian connection setup * Web: align Obsidian REST API defaults

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat(agent): add simple observation masking extension#386

feat(agent): add simple observation masking extension#386
obviyus wants to merge 3 commits intoopenclaw:mainfrom
obviyus:exp/observation-masking

obviyus commented Jan 7, 2026

Uh oh!

maxsumrall commented Jan 7, 2026 •

edited

Loading

Uh oh!

steipete commented Jan 7, 2026

Uh oh!

maxsumrall commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

obviyus commented Jan 7, 2026

Uh oh!

maxsumrall commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steipete commented Jan 7, 2026

Uh oh!

maxsumrall commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxsumrall commented Jan 7, 2026 •

edited

Loading