Skip to content

PostgreSQL and MySQL diagnostic tools fail with TypeError because @tool decorator is missing is_available/extract_params wiring #702

@ebrahim-sameh

Description

@ebrahim-sameh

All 10 PostgreSQL and MySQL diagnostic tools fail with TypeError: missing 2 required positional arguments: 'host' and 'database' every time the agent tries to invoke them. Each tool's @tool(...) decorator is missing the is_available and extract_params callbacks that inject connection params from resolved integrations. The parallel MariaDB tool family has these wired up correctly; PostgreSQL and MySQL do not.

Impact: every single RDS PostgreSQL synthetic scenario (all 15 in tests/synthetic/rds_postgres/) fails with the agent stuck in retry loops on these broken tools, unable to gather evidence.

How I found this

Running a full end-to-end audit of the synthetic RCA suite using Gemini 2.5-flash as the LLM backend. Work I did to confirm scope and root cause:

  1. Ran 4 individual RDS scenarios manually (000-healthy, 001-replication-lag, 002-connection-exhaustion, 004-cpu-saturation-bad-query) and observed the same TypeError pattern in every one.
  2. Ran the full RDS synthetic suite (make test-rds-synthetic) against origin/main to capture baseline: 0/15 scenarios passed, 838 s total wall time, 190 PG/MySQL TypeError failures across the run.
  3. Inspected the agent's trajectory and tool-call behavior per scenario. Confirmed repeated retry loops on the same failing tools (4–10 retries per scenario before the max-loop cap).
  4. Traced the failure back through the execution layer (app/nodes/investigate/execution/execute_actions.py:49-50), the tool registry (app/tools/registered_tool.py:278-279), and the tool decorator (app/tools/tool_decorator.py).
  5. Compared the @tool decorator wiring between MariaDB (working pattern) and PG/MySQL (broken pattern) across all 15 database tool files.

The broken tools (10 total)

PostgreSQL (5):

  • get_postgresql_current_queries
  • get_postgresql_replication_status
  • get_postgresql_server_status
  • get_postgresql_slow_queries
  • get_postgresql_table_stats

MySQL (5):

  • get_mysql_current_processes
  • get_mysql_replication_status
  • get_mysql_server_status
  • get_mysql_slow_queries
  • get_mysql_table_stats

Root cause

In app/tools/registered_tool.py:278-279, when @tool(...) is called without is_available or extract_params, they fall back to:

  • _always_available — always returns True, so the tool is exposed to the LLM regardless of whether the integration is configured
  • _extract_no_params — returns {}, so no kwargs are injected from resolved integrations

The execution layer at app/nodes/investigate/execution/execute_actions.py:49-50 does:

kwargs = action.extract_params(available_sources)
data = action.run(**kwargs)

If extract_params returns {}, the tool function is called with no kwargs. But all 10 PG/MySQL tools declare host: str and database: str as required positional parameters — so Python raises TypeError: get_postgresql_server_status() missing 2 required positional arguments: 'host' and 'database'.

Meanwhile, the MariaDB family (app/tools/MariaDB*Tool/) wires this up correctly:

@tool(
    ...
    is_available=mariadb_is_available,
    extract_params=mariadb_extract_params,
)

Those helpers live at app/integrations/mariadb.py:160-176. PostgreSQL and MySQL integrations never got equivalent helpers, so the corresponding tools never get wired up.

Baseline data (origin/main, full RDS synthetic suite)

Run: make test-rds-synthetic on commit 6b02982 with LLM_PROVIDER=gemini and GEMINI_REASONING_MODEL=gemini-2.5-flash.

  • Pass rate: 0/15 scenarios passed
  • Wall time: 838 s (~14 min)
  • 190 PG/MySQL TypeError failures:
Tool Failures
get_postgresql_server_status 92
get_postgresql_current_queries 40
get_postgresql_replication_status 32
get_postgresql_table_stats 20
get_mysql_replication_status 6

Per-scenario fail reasons on baseline: every scenario failed with either "wrong category", "missing required keywords", or "required evidence not gathered", because the agent couldn't gather evidence through the broken tools and kept cycling.

Downstream effects I observed

Because tool calls fail silently (the TypeError is caught and logged as a warning, but the agent keeps retrying):

  1. Retry loops. The agent retries the same failing tool up to max_investigation_loops times, burning LLM tokens and wall time without making progress.
  2. Fabricated evidence citations. The final RCA output cites evidence the agent never actually retrieved. Example from scenario 002: "Cited Evidence: Queries: opsgenie alerts" — but every opsgenie call also failed during that run.
  3. Truncated error messages. The TypeError detail in logs is cut off mid-word (TypeError: get_postgresql_server_status() missing ). Separate bug, makes this one harder to spot from logs alone.

Proposed fix

Mirror the MariaDB pattern exactly:

  1. Add postgresql_is_available(sources) and postgresql_extract_params(sources) to app/integrations/postgresql.py.
  2. Add mysql_is_available(sources) and mysql_extract_params(sources) to app/integrations/mysql.py.
  3. Wire both into all 10 @tool(...) decorators.

PostgreSQL and MySQL integrations already resolve credentials internally via resolve_postgresql_config and resolve_mysql_config (reading from the store/env), so extract_params only surfaces the three identifying params: host, database, port. Credentials (username, password, ssl_mode) stay out of tool signatures and are never seen by the LLM.

Rollout

Submitting two PRs to keep the review surface small per change:

  • PR 1 — PostgreSQL side: adds the helpers to app/integrations/postgresql.py, wires them into the 5 PG tools. Strong empirical evidence from the RDS synthetic suite.
  • PR 2 — MySQL side: same pattern, for app/integrations/mysql.py and the 5 MySQL tools.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions