feat(agent): memory condensation for longer context#1457
feat(agent): memory condensation for longer context#1457kalvinnchau merged 14 commits intoblock:mainfrom
Conversation
16c08df to
bdeb30e
Compare
The previous implementation is actually buggy when the length of the context is greater than `2 * model_limit`. We should apply incremental summarization.
|
I think it would be more efficient to save the compressed chat in the log file, rather than compress the long chat every time. Since it's not directly related to compression strategies, I will open another PR to implement this. |
|
I like that idea - we had something like this in the old python version (but it would do it each time, which loses prompt caching) - any reason why truncate.rs is changed - would this be an alternative one completely ideally? |
Oh I modified truncate.rs to fit in the uniform |
|
I have undone the unnecessary changes :) |
|
I like the idea here too; but I do think this should be a separate agent from truncate since this introduces a large change to the default behavior. Which would be found and set via: ❯ goose agents
Available agent versions:
* truncate (default)
reference
❯ cat ~/.config/goose/config.yaml
GOOSE_AGENT: "summarize" |
|
Thanks for your guidance! I have moved the feature to a separate "summarize" agent, and also tweaked the CLI so it displays the current agent version: ❯ ./target/debug/goose agents
Available agent versions:
reference
* summarize
truncate (default)
|
|
nice, yes that seems clearer now, @alexhancock @baxen @salman1993 I think this is interesting and potentially low risk to try, especially now we have sessions across GUI and CLI, people are likely to come across larger and larger sessions, so having some summarize would be interesting to try. |
|
That's cool! Please let me know if there's anything else I could do. |
michaelneale
left a comment
There was a problem hiding this comment.
I like this - adds a new agent for people to try and I like the sound of this approach, seems worth a try?
* main: bugfix: refactor workdirs to be async-safe, and simpler (#1558) feat: split required_extensions in bench to builtin/external (#1547) fix: continue to use resumed session after confirmation is cancelled (#1548) feat: add image tool to developer mcp (#1515) docs: using gooseignore (#1554) ci: use cargo update --workspace to ensure Cargo.lock is updated (#1539) fix: respond to interrupted tool calls with a ToolResponseMessageContent (#1557) fix: get tool def back to chat mode (#1538) ui: add default icon (#1553) fix: fix summarize agent, use session_id and add provider fn (#1552) feat(agent): memory condensation for longer context (#1457) docs: goose tips blog (#1550) docs: update to provider view (#1546) docs: resuming sessions (#1543) feat: goose bench framework for functional and regression testing feat: use refresh_tokens from databricks api (#1517) feat: use Ctrl/Cmd + ↑/↓ to navigate message history (#1501) feat: remove tools from chat mode (#1533) feat: use dropdown for goose selection (#1531) docs: goosehints in desktop (#1529)
When context grows beyond the limit of the model, the current implementation will cut off older messages. This PR introduces a more graceful solution that lets the model itself summarize the earlier chat history and use it as part of the context. This enables the model to retain crucial information for a prolonged duration.