feat(agent): add simple observation masking extension#386
feat(agent): add simple observation masking extension#386obviyus wants to merge 3 commits intoopenclaw:mainfrom
Conversation
0851c73 to
a64ef3c
Compare
|
I incorporated the ideas here, as far as I could see, by adding The main difference also afaik is this PR counts last N tool calls while in #381 we attempt to count 'turns' or entire steps the agent takes. For example, if the user asks Q1 and agent thinks and performs 10 tool calls and answers, and user then asks Q2 + agent does 10 more tool calls, this PR will count the last N tools calls (of which there are now 20 in this example) while in #381 we count this as 2 agents turns, and would only prune/mask tool calls once they are more than the configure N turns old. I think the advantage with approach in #381 is that you won't mask tool calls the agent does in the middle of a 'turn'. Although this might depend a lot of the use cases. I think for clawdbot this works fine. For i.e. codex cli, this might not work— as it can run for 30 minutes doing an entire project implementation in one 'turn'. We may want to refine this though if the current approach is indeed not wise. (n.b. I think I am using the term 'turn' incorrectly here, will figure it out...) #381 is merged in main, suggest we close this PR. Lets keep discussing! |
|
Interesting. Wouldn't that trash the cash if we change history? |
Yeah good question! My understanding of prompt caching is that it’s exact prefix matching. In our case #381 is deterministically pruning (both in aggressive mode which is similar to this PR’s approach, and in the soft/hard adaptive approach). So once a given older tool result becomes trimmed/cleared, that content stays stable across later requests and should be cacheable. There is still some churn as your “sliding trim boundary” advances. Example: if you have messages [1,2,3,4] and we preserve the N=2 most recent messages, we trim the older ones → [1t,2t,3,4]. Now when you append message 5, the boundary advances and 3 becomes newly eligible → [1t,2t,3t,4,5]. On that request, the first mismatch vs the previous prompt is inside message 3, so the cache could/should still hit for the prefix up to [1t,2t], and then recompute the rest. So in this case it’s not a total cache miss — it’s a cache hit up to the point in the session right before the newly-pruned message, rather than a hit up to “everything except the new appended message”. But my analysis also makes some assumptions about how the different LLM providers work. I’m not an expert in this topic (yet 😅) so the actual way the caches work might make this all moot. I have some minor tweaks that might further improve caching based on thinking about this! Will circle back in a day or two. 👇 🤖 If someone else wants to try it earlier: |
…claw#386) * Web: add Obsidian connection setup * Web: align Obsidian REST API defaults
Adds opt-in observation masking that replaces older tool results with a placeholder before sending context to the LLM. This reduces token usage in long-running sessions based on https://arxiv.org/abs/2508.21433.
Preceded by #381
This PR overlaps with #381 (context pruning). Key differences:
Trade-off: This is simpler and more predictable but less sophisticated. #381 is smarter about when and how to prune.
Config: