Skip to content

refactor(tools): resolve action labels from registry metadata#915

Merged
muddlebee merged 9 commits intoTracer-Cloud:mainfrom
imjohnzakkam:issue/870-tool-label-metadata
Apr 29, 2026
Merged

refactor(tools): resolve action labels from registry metadata#915
muddlebee merged 9 commits intoTracer-Cloud:mainfrom
imjohnzakkam:issue/870-tool-label-metadata

Conversation

@imjohnzakkam
Copy link
Copy Markdown
Contributor

Fixes #870

Describe the changes you have made in this PR -

Moved human-friendly tool labels out of app/output.py and into the tool registry flow by introducing an optional display_name metadata field on registered tools.

app.output._humanise_message() and app.remote.reasoning.tool_display_name() now resolve labels through the registry, and the existing user-facing labels are preserved by annotating the currently humanized tools with explicit display_name values.

Screenshots of the UI changes (If any) -

  • None

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:
I added display_name to the shared tool metadata contract (ToolMetadata, RegisteredTool, and the @tool(...) decorator) and exposed one registry-backed resolver for human-friendly tool labels. This removes app/output.py as the only source of truth for action labels and lets new tools participate by declaring their own display names in metadata.

For edge cases, I preserved the existing labels for the currently humanized tools by setting explicit display_name values on those tool definitions, and I kept a graceful fallback for unknown tool names by de-snaking them when the registry has no matching tool. I also added direct tests for registry-backed label resolution, output humanization from real registered tool metadata, and unknown-tool fallback behavior.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • My code follows the project's style guidelines and conventions

Verification

  • python -m pytest tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.py
  • ruff check app/tools/base.py app/tools/registered_tool.py app/tools/tool_decorator.py app/tools/registry.py app/output.py app/remote/reasoning.py tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.py
  • ruff format --check app/tools/base.py app/tools/registered_tool.py app/tools/tool_decorator.py app/tools/registry.py app/output.py app/remote/reasoning.py tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.py

Output summary:

  • 32 passed
  • All checks passed!
  • 9 files already formatted

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR moves human-friendly tool display labels out of two static dicts (_ACTION_DISPLAY in output.py and TOOL_DISPLAY in reasoning.py) and into the tool registry by adding an optional display_name metadata field to ToolMetadata, BaseTool, RegisteredTool, and the @tool decorator. All 20 previously-labeled tools receive explicit display_name values, and a new resolve_tool_display_name helper in registry.py replaces both ad-hoc dicts. The three reasoning.py-only entries (query_kubernetes_logs, query_elasticsearch, get_deployment_status) had no matching tool names in the registry and were already stale — their removal is safe.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/consistency suggestions with no correctness impact

The refactor is logically correct: all existing display names are preserved, fallback behaviour is unchanged, and the LRU-cached snapshot ensures the registry is loaded only once. The two P2 findings (display_name not synced in __init_subclass__, and dict rebuilt per call in hot paths) have no effect on current behaviour.

No files require special attention

Important Files Changed

Filename Overview
app/tools/registry.py Adds resolve_tool_display_name — the new single resolver for human-friendly labels; get_registered_tool_map() is called on every invocation (minor perf concern in hot streaming paths)
app/tools/base.py Adds display_name to ToolMetadata and BaseTool; __init_subclass__ does not sync display_name back to the class, breaking the normalization invariant set by every other metadata field
app/tools/registered_tool.py Propagates display_name through all three construction paths (from_base_tool, from_function, update_metadata) and the to_metadata serialisation — complete and correct
app/tools/tool_decorator.py Threads display_name through all four overloads and the any_kwarg_set guard; straightforward additive change
app/output.py Removes the _ACTION_DISPLAY dict and delegates to resolve_tool_display_name; fallback behaviour is preserved
app/remote/reasoning.py Drops the local TOOL_DISPLAY dict (including three stale entries with no matching tool names) and delegates to the shared resolver
tests/tools/test_registry.py Adds two new tests for resolve_tool_display_name (registered-metadata path and unknown-tool fallback); autouse fixture correctly clears the LRU cache around each test
tests/test_output.py New file — two tests exercising _humanise_message end-to-end against real registered tool metadata and the unknown-tool fallback path

Sequence Diagram

sequenceDiagram
    participant UI as output.py / reasoning.py
    participant R as registry.resolve_tool_display_name()
    participant M as _load_registry_snapshot() [lru_cache]
    participant T as RegisteredTool.display_name

    UI->>R: resolve_tool_display_name("query_datadog_logs")
    R->>M: get_registered_tool_map()
    M-->>R: {name: RegisteredTool, ...} (cached after first call)
    R->>T: tool.display_name
    T-->>R: "Datadog logs"
    R-->>UI: "Datadog logs"

    UI->>R: resolve_tool_display_name("unknown_tool")
    R->>M: get_registered_tool_map()
    M-->>R: tool not found
    R-->>UI: "unknown tool" (fallback: replace _ with space)
Loading

Comments Outside Diff (1)

  1. app/tools/base.py, line 74-84 (link)

    P2 display_name not synced back in __init_subclass__

    Every other metadata field (name, description, input_schema, source, use_cases, requires, outputs, retrieval_controls) is written back to cls after validation in __init_subclass__, but display_name is not. This means whitespace normalization from _require_non_empty_strings is skipped at class-definition time for BaseTool subclasses that declare a display_name.

    The value is eventually corrected when RegisteredTool.from_base_tool runs candidate.metadata() at registration, so this is harmless in practice today — but it breaks the invariant established by every other field on the class and will surprise a future author who sets display_name = " My Tool " and finds the trailing spaces stored on cls.display_name.

Reviews (1): Last reviewed commit: "refactor(tools): resolve action labels f..." | Re-trigger Greptile

Comment thread app/tools/registry.py
Comment on lines +167 to +172
def resolve_tool_display_name(tool_name: str) -> str:
"""Return a human-friendly label for a tool name."""
tool = get_registered_tool_map().get(tool_name)
if tool is not None:
return tool.display_name or tool.name.replace("_", " ")
return tool_name.replace("_", " ")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 get_registered_tool_map() rebuilt on every label resolution

resolve_tool_display_name is called in two hot paths: once per action in _humanise_message and once per streaming tool-start/tool-end event in reasoning.py. Each call re-constructs the full dict[str, RegisteredTool] from the cached snapshot. The snapshot itself is @lru_cached, but the dict rebuild is not.

For interactive streaming with many tool events this is redundant allocation on every event. Iterating the cached snapshot directly avoids the extra dict construction:

def resolve_tool_display_name(tool_name: str) -> str:
    """Return a human-friendly label for a tool name."""
    snapshot = _load_registry_snapshot()
    tool = next((t for t in snapshot if t.name == tool_name), None)
    if tool is not None:
        return tool.display_name or tool.name.replace("_", " ")
    return tool_name.replace("_", " ")

@imjohnzakkam
Copy link
Copy Markdown
Contributor Author

Updated catch by greptile. I added a cached registry tool-name map and switched resolve_tool_display_name() to use that directly, so we avoid rebuilding the dict on every label lookup while keeping the same fallback behaviour.

@muddlebee
Copy link
Copy Markdown
Collaborator

muddlebee commented Apr 28, 2026

@imjohnzakkam pls fix conflicts and review comments

@imjohnzakkam
Copy link
Copy Markdown
Contributor Author

@muddlebee done, pls check and merge 🚀

Comment thread app/remote/reasoning.py
@@ -32,7 +25,7 @@

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three tools that only lived in the old TOOL_DISPLAY dict (query_kubernetes_logs → "Kubernetes logs", query_elasticsearch → "Elasticsearch", get_deployment_status → "deployment status") were never annotated with display_name. They now fall back to de-snaked lowercase strings like "query kubernetes logs" instead of "Kubernetes logs". The PR description says existing labels are preserved, but these three regress silently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the current registered tool names after the merge and those exact identifiers (query_kubernetes_logs, query_elasticsearch, get_deployment_status) do not appear to be live registered tools on this branch. For example, Elasticsearch is currently registered as query_elasticsearch_logs. I left them out to avoid preserving stale aliases, but happy to add a small legacy fallback if we still expect those names to appear in streaming events.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjohnzakkam yes a fallback makes sense right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muddlebee will do that and update

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muddlebee Added the small legacy fallback in 314535f for the three stale event names:

  • query_kubernetes_logs -> Kubernetes logs
  • query_elasticsearch -> Elasticsearch
  • get_deployment_status -> deployment status

Also added a regression test to preserve those labels. The follow-up format-only commit is 96f4c12, and local checks are green:
ruff check app/ tests/, ruff format --check app/ tests/, and the focused pytest set all pass.


def should_register_function() -> bool:
return any(
[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

display_name is None shouldn't be a registration trigger. The any-arg check is meant to distinguish bare @tool from @tool(...). display_name is pure cosmetic metadata — adding it here means @tool(display_name="X") with no name/source would attempt registration and fail validation rather than being treated as a bare decorator. Recommend removing display_name from this check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b88019a by removing display_name from the registration trigger check.

Comment thread app/tools/registry.py
return dict(_load_registry_tool_map())
return {tool.name: tool for tool in get_registered_tools(surface)}


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback returns tool_name.replace('_', ' ') without .title(), so unknown tools render as "my custom tool" rather than "My Custom Tool". The test even asserts the lowercase form. Intentional? The old reasoning.py fallback was also lowercase, so this is consistent — just flagging since display_name-annotated tools will be Title Case and fallback tools won't be.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional for now. This preserves the previous fallback behavior from reasoning.py; registered tools can opt into title-cased labels via display_name.

Comment thread app/tools/base.py
def _require_non_empty_strings(cls, value: str | None) -> str | None:
if value is None:
return None
normalized = value.strip()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _require_non_empty_strings validator now runs on display_name too. That's correct. One edge case: an empty string display_name="" will raise a ValueError at registration time with a message referencing the field name. Consider ensuring the error message is clear enough to debug (Pydantic usually includes the field name, so this is probably fine).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the shared validator here. Pydantic includes the field name in the validation error, so the failure should still point clearly at display_name.

def test_resolve_tool_display_name_falls_back_for_unknown_tools() -> None:
assert registry_module.resolve_tool_display_name("my_custom_tool") == "my custom tool"


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_resolve_tool_display_name_falls_back_for_unknown_tools calls resolve_tool_display_name directly without any monkeypatching. This hits the real registry cache (_load_registry_tool_map). If a real tool named my_custom_tool is ever added to the repo, this test will silently start returning its display_name instead of the fallback. Mock the registry or use a name that is guaranteed to not collide (e.g. a UUID-ish sentinel like nonexistent_tool_xyz).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b88019a; the fallback test now uses a sentinel name (nonexistent_tool_xyz_sentinel) so it should not collide with future real tools.

@muddlebee
Copy link
Copy Markdown
Collaborator

muddlebee commented Apr 29, 2026

@imjohnzakkam hey really sorry.

I checked the current registered tool names after the merge and those exact identifiers (query_kubernetes_logs, query_elasticsearch, get_deployment_status) do not appear to be live registered tools on this branch.

you were right, this doesn't have registered tools on this branch. a fallback wasn't required. you can revert back to the last CI green entry :)

sorry again for confusion.

@muddlebee
Copy link
Copy Markdown
Collaborator

image

till here its fine, if you can reset till here :) good to go

@imjohnzakkam
Copy link
Copy Markdown
Contributor Author

@muddlebee thanks reverted the legacy fallback follow-up.

@muddlebee muddlebee merged commit 60413c1 into Tracer-Cloud:main Apr 29, 2026
7 checks passed
@muddlebee
Copy link
Copy Markdown
Collaborator

@imjohnzakkam great work ⚡

@github-actions
Copy link
Copy Markdown
Contributor

🎊 Achievement unlocked: PR Merged. @imjohnzakkam passed code review, survived CI, and shipped. Respect. 🤝


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace hard-coded _ACTION_DISPLAY labels with tool metadata

2 participants