refactor(tools): resolve action labels from registry metadata by imjohnzakkam · Pull Request #915 · Tracer-Cloud/opensre

imjohnzakkam · 2026-04-25T04:16:18Z

Fixes #870

Describe the changes you have made in this PR -

Moved human-friendly tool labels out of app/output.py and into the tool registry flow by introducing an optional display_name metadata field on registered tools.

app.output._humanise_message() and app.remote.reasoning.tool_display_name() now resolve labels through the registry, and the existing user-facing labels are preserved by annotating the currently humanized tools with explicit display_name values.

Screenshots of the UI changes (If any) -

None

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:
I added display_name to the shared tool metadata contract (ToolMetadata, RegisteredTool, and the @tool(...) decorator) and exposed one registry-backed resolver for human-friendly tool labels. This removes app/output.py as the only source of truth for action labels and lets new tools participate by declaring their own display names in metadata.

For edge cases, I preserved the existing labels for the currently humanized tools by setting explicit display_name values on those tool definitions, and I kept a graceful fallback for unknown tool names by de-snaking them when the registry has no matching tool. I also added direct tests for registry-backed label resolution, output humanization from real registered tool metadata, and unknown-tool fallback behavior.

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
My code follows the project's style guidelines and conventions

Verification

python -m pytest tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.py
ruff check app/tools/base.py app/tools/registered_tool.py app/tools/tool_decorator.py app/tools/registry.py app/output.py app/remote/reasoning.py tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.py
ruff format --check app/tools/base.py app/tools/registered_tool.py app/tools/tool_decorator.py app/tools/registry.py app/output.py app/remote/reasoning.py tests/tools/test_registry.py tests/test_output.py tests/app/remote/test_reasoning.py

Output summary:

32 passed
All checks passed!
9 files already formatted

greptile-apps · 2026-04-25T04:19:10Z

Greptile Summary

This PR moves human-friendly tool display labels out of two static dicts (_ACTION_DISPLAY in output.py and TOOL_DISPLAY in reasoning.py) and into the tool registry by adding an optional display_name metadata field to ToolMetadata, BaseTool, RegisteredTool, and the @tool decorator. All 20 previously-labeled tools receive explicit display_name values, and a new resolve_tool_display_name helper in registry.py replaces both ad-hoc dicts. The three reasoning.py-only entries (query_kubernetes_logs, query_elasticsearch, get_deployment_status) had no matching tool names in the registry and were already stale — their removal is safe.

Confidence Score: 5/5

Safe to merge — all findings are P2 style/consistency suggestions with no correctness impact

The refactor is logically correct: all existing display names are preserved, fallback behaviour is unchanged, and the LRU-cached snapshot ensures the registry is loaded only once. The two P2 findings (display_name not synced in __init_subclass__, and dict rebuilt per call in hot paths) have no effect on current behaviour.

No files require special attention

Important Files Changed

Filename	Overview
app/tools/registry.py	Adds `resolve_tool_display_name` — the new single resolver for human-friendly labels; `get_registered_tool_map()` is called on every invocation (minor perf concern in hot streaming paths)
app/tools/base.py	Adds `display_name` to `ToolMetadata` and `BaseTool`; `__init_subclass__` does not sync `display_name` back to the class, breaking the normalization invariant set by every other metadata field
app/tools/registered_tool.py	Propagates `display_name` through all three construction paths (`from_base_tool`, `from_function`, `update_metadata`) and the `to_metadata` serialisation — complete and correct
app/tools/tool_decorator.py	Threads `display_name` through all four overloads and the `any_kwarg_set` guard; straightforward additive change
app/output.py	Removes the `_ACTION_DISPLAY` dict and delegates to `resolve_tool_display_name`; fallback behaviour is preserved
app/remote/reasoning.py	Drops the local `TOOL_DISPLAY` dict (including three stale entries with no matching tool names) and delegates to the shared resolver
tests/tools/test_registry.py	Adds two new tests for `resolve_tool_display_name` (registered-metadata path and unknown-tool fallback); autouse fixture correctly clears the LRU cache around each test
tests/test_output.py	New file — two tests exercising `_humanise_message` end-to-end against real registered tool metadata and the unknown-tool fallback path

Sequence Diagram

sequenceDiagram
    participant UI as output.py / reasoning.py
    participant R as registry.resolve_tool_display_name()
    participant M as _load_registry_snapshot() [lru_cache]
    participant T as RegisteredTool.display_name

    UI->>R: resolve_tool_display_name("query_datadog_logs")
    R->>M: get_registered_tool_map()
    M-->>R: {name: RegisteredTool, ...} (cached after first call)
    R->>T: tool.display_name
    T-->>R: "Datadog logs"
    R-->>UI: "Datadog logs"

    UI->>R: resolve_tool_display_name("unknown_tool")
    R->>M: get_registered_tool_map()
    M-->>R: tool not found
    R-->>UI: "unknown tool" (fallback: replace _ with space)

Comments Outside Diff (1)

app/tools/base.py, line 74-84 (link)

display_name not synced back in __init_subclass__

Every other metadata field (name, description, input_schema, source, use_cases, requires, outputs, retrieval_controls) is written back to cls after validation in __init_subclass__, but display_name is not. This means whitespace normalization from _require_non_empty_strings is skipped at class-definition time for BaseTool subclasses that declare a display_name.

The value is eventually corrected when RegisteredTool.from_base_tool runs candidate.metadata() at registration, so this is harmless in practice today — but it breaks the invariant established by every other field on the class and will surprise a future author who sets display_name = " My Tool " and finds the trailing spaces stored on cls.display_name.

_{Reviews (1): Last reviewed commit: "refactor(tools): resolve action labels f..." | Re-trigger Greptile}

greptile-apps · 2026-04-25T04:19:17Z

+def resolve_tool_display_name(tool_name: str) -> str:
+    """Return a human-friendly label for a tool name."""
+    tool = get_registered_tool_map().get(tool_name)
+    if tool is not None:
+        return tool.display_name or tool.name.replace("_", " ")
+    return tool_name.replace("_", " ")


get_registered_tool_map() rebuilt on every label resolution

resolve_tool_display_name is called in two hot paths: once per action in _humanise_message and once per streaming tool-start/tool-end event in reasoning.py. Each call re-constructs the full dict[str, RegisteredTool] from the cached snapshot. The snapshot itself is @lru_cached, but the dict rebuild is not.

For interactive streaming with many tool events this is redundant allocation on every event. Iterating the cached snapshot directly avoids the extra dict construction:

def resolve_tool_display_name(tool_name: str) -> str: """Return a human-friendly label for a tool name.""" snapshot = _load_registry_snapshot() tool = next((t for t in snapshot if t.name == tool_name), None) if tool is not None: return tool.display_name or tool.name.replace("_", " ") return tool_name.replace("_", " ")

imjohnzakkam · 2026-04-25T04:23:19Z

Updated catch by greptile. I added a cached registry tool-name map and switched resolve_tool_display_name() to use that directly, so we avoid rebuilding the dict on every label lookup while keeping the same fallback behaviour.

muddlebee · 2026-04-28T05:59:02Z

@imjohnzakkam pls fix conflicts and review comments

imjohnzakkam · 2026-04-28T17:50:19Z

@muddlebee done, pls check and merge 🚀

muddlebee · 2026-04-28T17:53:05Z

@@ -32,7 +25,7 @@



Three tools that only lived in the old TOOL_DISPLAY dict (query_kubernetes_logs → "Kubernetes logs", query_elasticsearch → "Elasticsearch", get_deployment_status → "deployment status") were never annotated with display_name. They now fall back to de-snaked lowercase strings like "query kubernetes logs" instead of "Kubernetes logs". The PR description says existing labels are preserved, but these three regress silently.

I checked the current registered tool names after the merge and those exact identifiers (query_kubernetes_logs, query_elasticsearch, get_deployment_status) do not appear to be live registered tools on this branch. For example, Elasticsearch is currently registered as query_elasticsearch_logs. I left them out to avoid preserving stale aliases, but happy to add a small legacy fallback if we still expect those names to appear in streaming events.

@imjohnzakkam yes a fallback makes sense right?

@muddlebee will do that and update

@muddlebee Added the small legacy fallback in 314535f for the three stale event names:

query_kubernetes_logs -> Kubernetes logs

query_elasticsearch -> Elasticsearch

get_deployment_status -> deployment status

Also added a regression test to preserve those labels. The follow-up format-only commit is 96f4c12, and local checks are green:
ruff check app/ tests/, ruff format --check app/ tests/, and the focused pytest set all pass.

muddlebee · 2026-04-28T17:53:14Z


    def should_register_function() -> bool:
        return any(
            [


display_name is None shouldn't be a registration trigger. The any-arg check is meant to distinguish bare @tool from @tool(...). display_name is pure cosmetic metadata — adding it here means @tool(display_name="X") with no name/source would attempt registration and fail validation rather than being treated as a bare decorator. Recommend removing display_name from this check.

Fixed in b88019a by removing display_name from the registration trigger check.

muddlebee · 2026-04-28T17:53:25Z

+        return dict(_load_registry_tool_map())
    return {tool.name: tool for tool in get_registered_tools(surface)}
+
+


Fallback returns tool_name.replace('_', ' ') without .title(), so unknown tools render as "my custom tool" rather than "My Custom Tool". The test even asserts the lowercase form. Intentional? The old reasoning.py fallback was also lowercase, so this is consistent — just flagging since display_name-annotated tools will be Title Case and fallback tools won't be.

Intentional for now. This preserves the previous fallback behavior from reasoning.py; registered tools can opt into title-cased labels via display_name.

muddlebee · 2026-04-28T17:53:33Z

+    def _require_non_empty_strings(cls, value: str | None) -> str | None:
+        if value is None:
+            return None
        normalized = value.strip()


The _require_non_empty_strings validator now runs on display_name too. That's correct. One edge case: an empty string display_name="" will raise a ValueError at registration time with a message referencing the field name. Consider ensuring the error message is clear enough to debug (Pydantic usually includes the field name, so this is probably fine).

Keeping the shared validator here. Pydantic includes the field name in the validation error, so the failure should still point clearly at display_name.

muddlebee · 2026-04-28T17:53:41Z

+def test_resolve_tool_display_name_falls_back_for_unknown_tools() -> None:
+    assert registry_module.resolve_tool_display_name("my_custom_tool") == "my custom tool"
+
+


test_resolve_tool_display_name_falls_back_for_unknown_tools calls resolve_tool_display_name directly without any monkeypatching. This hits the real registry cache (_load_registry_tool_map). If a real tool named my_custom_tool is ever added to the repo, this test will silently start returning its display_name instead of the fallback. Mock the registry or use a name that is guaranteed to not collide (e.g. a UUID-ish sentinel like nonexistent_tool_xyz).

Fixed in b88019a; the fallback test now uses a sentinel name (nonexistent_tool_xyz_sentinel) so it should not collide with future real tools.

muddlebee · 2026-04-29T07:12:55Z

@imjohnzakkam hey really sorry.

I checked the current registered tool names after the merge and those exact identifiers (query_kubernetes_logs, query_elasticsearch, get_deployment_status) do not appear to be live registered tools on this branch.

you were right, this doesn't have registered tools on this branch. a fallback wasn't required. you can revert back to the last CI green entry :)

sorry again for confusion.

muddlebee · 2026-04-29T07:14:38Z

till here its fine, if you can reset till here :) good to go

This reverts commit 96f4c12.

This reverts commit 314535f.

imjohnzakkam · 2026-04-29T07:28:53Z

@muddlebee thanks reverted the legacy fallback follow-up.

muddlebee · 2026-04-29T07:29:15Z

@imjohnzakkam great work ⚡

github-actions · 2026-04-29T07:29:15Z

🎊 Achievement unlocked: PR Merged. @imjohnzakkam passed code review, survived CI, and shipped. Respect. 🤝

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

refactor(tools): resolve action labels from registry metadata

502f923

greptile-apps Bot reviewed Apr 25, 2026

View reviewed changes

imjohnzakkam added 2 commits April 25, 2026 09:51

test(plan-actions): use real InvestigateInput in retrieval controls test

69e480d

perf(tools): cache registry label lookups

f3141a1

Merge origin/main into issue/870-tool-label-metadata

b65a5c6

muddlebee reviewed Apr 28, 2026

View reviewed changes

imjohnzakkam added 3 commits April 28, 2026 23:36

fix(tools): address PR review metadata edge cases

b88019a

fix(tools): preserve legacy display labels

314535f

style(tests): format legacy label test

96f4c12

imjohnzakkam added 2 commits April 29, 2026 12:55

Revert "style(tests): format legacy label test"

21ed590

This reverts commit 96f4c12.

Revert "fix(tools): preserve legacy display labels"

02b40c9

This reverts commit 314535f.

muddlebee merged commit 60413c1 into Tracer-Cloud:main Apr 29, 2026
7 checks passed

		return dict(_load_registry_tool_map())
		return {tool.name: tool for tool in get_registered_tools(surface)}

		def test_resolve_tool_display_name_falls_back_for_unknown_tools() -> None:
		assert registry_module.resolve_tool_display_name("my_custom_tool") == "my custom tool"

Conversation

imjohnzakkam commented Apr 25, 2026

Describe the changes you have made in this PR -

Screenshots of the UI changes (If any) -

Code Understanding and AI Usage

Checklist before requesting a review

Verification

Uh oh!

greptile-apps Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

imjohnzakkam commented Apr 25, 2026

Uh oh!

muddlebee commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imjohnzakkam commented Apr 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muddlebee commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muddlebee commented Apr 29, 2026

Uh oh!

imjohnzakkam commented Apr 29, 2026

Uh oh!

Uh oh!

muddlebee commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 25, 2026 •

edited

Loading

muddlebee commented Apr 28, 2026 •

edited

Loading

muddlebee commented Apr 29, 2026 •

edited

Loading