refactor(tools): resolve action labels from registry metadata#915
Conversation
Greptile SummaryThis PR moves human-friendly tool display labels out of two static dicts ( Confidence Score: 5/5Safe to merge — all findings are P2 style/consistency suggestions with no correctness impact The refactor is logically correct: all existing display names are preserved, fallback behaviour is unchanged, and the LRU-cached snapshot ensures the registry is loaded only once. The two P2 findings ( No files require special attention Important Files Changed
Sequence DiagramsequenceDiagram
participant UI as output.py / reasoning.py
participant R as registry.resolve_tool_display_name()
participant M as _load_registry_snapshot() [lru_cache]
participant T as RegisteredTool.display_name
UI->>R: resolve_tool_display_name("query_datadog_logs")
R->>M: get_registered_tool_map()
M-->>R: {name: RegisteredTool, ...} (cached after first call)
R->>T: tool.display_name
T-->>R: "Datadog logs"
R-->>UI: "Datadog logs"
UI->>R: resolve_tool_display_name("unknown_tool")
R->>M: get_registered_tool_map()
M-->>R: tool not found
R-->>UI: "unknown tool" (fallback: replace _ with space)
|
| def resolve_tool_display_name(tool_name: str) -> str: | ||
| """Return a human-friendly label for a tool name.""" | ||
| tool = get_registered_tool_map().get(tool_name) | ||
| if tool is not None: | ||
| return tool.display_name or tool.name.replace("_", " ") | ||
| return tool_name.replace("_", " ") |
There was a problem hiding this comment.
get_registered_tool_map() rebuilt on every label resolution
resolve_tool_display_name is called in two hot paths: once per action in _humanise_message and once per streaming tool-start/tool-end event in reasoning.py. Each call re-constructs the full dict[str, RegisteredTool] from the cached snapshot. The snapshot itself is @lru_cached, but the dict rebuild is not.
For interactive streaming with many tool events this is redundant allocation on every event. Iterating the cached snapshot directly avoids the extra dict construction:
def resolve_tool_display_name(tool_name: str) -> str:
"""Return a human-friendly label for a tool name."""
snapshot = _load_registry_snapshot()
tool = next((t for t in snapshot if t.name == tool_name), None)
if tool is not None:
return tool.display_name or tool.name.replace("_", " ")
return tool_name.replace("_", " ")|
Updated catch by greptile. I added a cached registry tool-name map and switched |
|
@imjohnzakkam pls fix conflicts and review comments |
|
@muddlebee done, pls check and merge 🚀 |
| @@ -32,7 +25,7 @@ | |||
|
|
|||
There was a problem hiding this comment.
Three tools that only lived in the old TOOL_DISPLAY dict (query_kubernetes_logs → "Kubernetes logs", query_elasticsearch → "Elasticsearch", get_deployment_status → "deployment status") were never annotated with display_name. They now fall back to de-snaked lowercase strings like "query kubernetes logs" instead of "Kubernetes logs". The PR description says existing labels are preserved, but these three regress silently.
There was a problem hiding this comment.
I checked the current registered tool names after the merge and those exact identifiers (query_kubernetes_logs, query_elasticsearch, get_deployment_status) do not appear to be live registered tools on this branch. For example, Elasticsearch is currently registered as query_elasticsearch_logs. I left them out to avoid preserving stale aliases, but happy to add a small legacy fallback if we still expect those names to appear in streaming events.
There was a problem hiding this comment.
@imjohnzakkam yes a fallback makes sense right?
There was a problem hiding this comment.
@muddlebee Added the small legacy fallback in 314535f for the three stale event names:
query_kubernetes_logs->Kubernetes logsquery_elasticsearch->Elasticsearchget_deployment_status->deployment status
Also added a regression test to preserve those labels. The follow-up format-only commit is 96f4c12, and local checks are green:
ruff check app/ tests/, ruff format --check app/ tests/, and the focused pytest set all pass.
|
|
||
| def should_register_function() -> bool: | ||
| return any( | ||
| [ |
There was a problem hiding this comment.
display_name is None shouldn't be a registration trigger. The any-arg check is meant to distinguish bare @tool from @tool(...). display_name is pure cosmetic metadata — adding it here means @tool(display_name="X") with no name/source would attempt registration and fail validation rather than being treated as a bare decorator. Recommend removing display_name from this check.
There was a problem hiding this comment.
Fixed in b88019a by removing display_name from the registration trigger check.
| return dict(_load_registry_tool_map()) | ||
| return {tool.name: tool for tool in get_registered_tools(surface)} | ||
|
|
||
|
|
There was a problem hiding this comment.
Fallback returns tool_name.replace('_', ' ') without .title(), so unknown tools render as "my custom tool" rather than "My Custom Tool". The test even asserts the lowercase form. Intentional? The old reasoning.py fallback was also lowercase, so this is consistent — just flagging since display_name-annotated tools will be Title Case and fallback tools won't be.
There was a problem hiding this comment.
Intentional for now. This preserves the previous fallback behavior from reasoning.py; registered tools can opt into title-cased labels via display_name.
| def _require_non_empty_strings(cls, value: str | None) -> str | None: | ||
| if value is None: | ||
| return None | ||
| normalized = value.strip() |
There was a problem hiding this comment.
The _require_non_empty_strings validator now runs on display_name too. That's correct. One edge case: an empty string display_name="" will raise a ValueError at registration time with a message referencing the field name. Consider ensuring the error message is clear enough to debug (Pydantic usually includes the field name, so this is probably fine).
There was a problem hiding this comment.
Keeping the shared validator here. Pydantic includes the field name in the validation error, so the failure should still point clearly at display_name.
| def test_resolve_tool_display_name_falls_back_for_unknown_tools() -> None: | ||
| assert registry_module.resolve_tool_display_name("my_custom_tool") == "my custom tool" | ||
|
|
||
|
|
There was a problem hiding this comment.
test_resolve_tool_display_name_falls_back_for_unknown_tools calls resolve_tool_display_name directly without any monkeypatching. This hits the real registry cache (_load_registry_tool_map). If a real tool named my_custom_tool is ever added to the repo, this test will silently start returning its display_name instead of the fallback. Mock the registry or use a name that is guaranteed to not collide (e.g. a UUID-ish sentinel like nonexistent_tool_xyz).
There was a problem hiding this comment.
Fixed in b88019a; the fallback test now uses a sentinel name (nonexistent_tool_xyz_sentinel) so it should not collide with future real tools.
|
@imjohnzakkam hey really sorry.
you were right, this doesn't have registered tools on this branch. a fallback wasn't required. you can revert back to the last CI green entry :) sorry again for confusion. |
|
@muddlebee thanks reverted the legacy fallback follow-up. |
|
@imjohnzakkam great work ⚡ |
|
🎊 Achievement unlocked: PR Merged. @imjohnzakkam passed code review, survived CI, and shipped. Respect. 🤝 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |


Fixes #870
Describe the changes you have made in this PR -
Moved human-friendly tool labels out of
app/output.pyand into the tool registry flow by introducing an optionaldisplay_namemetadata field on registered tools.app.output._humanise_message()andapp.remote.reasoning.tool_display_name()now resolve labels through the registry, and the existing user-facing labels are preserved by annotating the currently humanized tools with explicitdisplay_namevalues.Screenshots of the UI changes (If any) -
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
I added
display_nameto the shared tool metadata contract (ToolMetadata,RegisteredTool, and the@tool(...)decorator) and exposed one registry-backed resolver for human-friendly tool labels. This removesapp/output.pyas the only source of truth for action labels and lets new tools participate by declaring their own display names in metadata.For edge cases, I preserved the existing labels for the currently humanized tools by setting explicit
display_namevalues on those tool definitions, and I kept a graceful fallback for unknown tool names by de-snaking them when the registry has no matching tool. I also added direct tests for registry-backed label resolution, output humanization from real registered tool metadata, and unknown-tool fallback behavior.Checklist before requesting a review
Verification
python -m pytest tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.pyruff check app/tools/base.py app/tools/registered_tool.py app/tools/tool_decorator.py app/tools/registry.py app/output.py app/remote/reasoning.py tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.pyruff format --check app/tools/base.py app/tools/registered_tool.py app/tools/tool_decorator.py app/tools/registry.py app/output.py app/remote/reasoning.py tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.pyOutput summary:
32 passedAll checks passed!9 files already formatted