feat(rds): RDS integration with describe instance and events tools#1287
feat(rds): RDS integration with describe instance and events tools#1287muddlebee merged 8 commits intoTracer-Cloud:mainfrom
Conversation
- Add app/integrations/rds.py for source detection and param extraction - Add RDSDescribeInstanceTool for instance status and configuration - Add RDSEventsTool for failover, maintenance, and parameter events - Use shared aws_sdk_client read-only allowlist for safety - Add unit tests for integration helpers and both tools - Add "rds" to EvidenceSource literal in app/types/evidence.py Closes Tracer-Cloud#125
Greptile Summary
Confidence Score: 3/5Not safe to merge — the integration is functionally dead-on-arrival; neither RDS tool will be offered to the planner in any deployment. Multiple P1 findings that together make the entire integration non-functional: load_env_integrations is not updated, _classify_service_instance has no 'rds' handler, and detect_sources never populates sources['rds']. The tools themselves are correct and all 14 tests pass, but those tests bypass the pipeline. The score is pulled below the P1 ceiling of 4 due to three compounding gaps on the critical path. app/integrations/_catalog_impl.py (load_env_integrations + _classify_service_instance) and app/nodes/plan_actions/detect_sources.py — neither is in the diff but both must be updated before the integration can work. Important Files Changed
Sequence DiagramsequenceDiagram
participant Env as Env Vars
participant LEI as load_env_integrations()
participant CI as classify_integrations()
participant DS as detect_sources()
participant IA as rds_is_available()
participant Tool as RDS Tools
Env->>LEI: RDS_DB_INSTANCE_IDENTIFIER set
Note over LEI: ❌ rds_config_from_env() never called
LEI-->>CI: (no "rds" record)
CI-->>DS: resolved_integrations (no "rds" key)
Note over DS: ❌ no "rds" block in detect_sources
DS-->>IA: sources (no "rds" key)
IA-->>Tool: returns False ❌
Note over Tool: Tools never offered to planner
Reviews (2): Last reviewed commit: "test(rds): add coverage for multiple-ins..." | Re-trigger Greptile |
Greptile review caught two related issues in app/integrations/rds.py:
- rds_config_from_env: env_str("AWS_REGION", DEFAULT_RDS_REGION) always
returned a non-empty string, making the RDS_REGION fallback unreachable
(P1 dead code).
- rds_extract_params: only consulted AWS_REGION, never RDS_REGION,
diverging from the config path (P2 inconsistency).
Both functions now resolve region as: source-supplied > AWS_REGION >
RDS_REGION > DEFAULT_RDS_REGION. Drops the now-unused os import.
Adds three tests covering the previously-uncovered fallback paths:
RDS_REGION-only in extract, RDS_REGION-only in config-from-env, and
default when neither env var is set.
- ruff check --fix reorganized imports in tests/tools/test_rds_tools.py - ruff format reformatted app/integrations/rds.py
- New docs/rds.mdx covering setup, env vars, IAM permissions, the two tools, use cases, and troubleshooting. References docs/aws.mdx for shared AWS credential setup rather than duplicating it. - Register the page under "Data and workflow systems" in docs/docs.json. Documents the RDS_REGION fallback semantics so users understand the region resolution order: source dict > AWS_REGION > RDS_REGION > default.
| "source": "rds", | ||
| "available": False, | ||
| "db_instance_identifier": db_instance_identifier, | ||
| "error": result.get("error"), |
There was a problem hiding this comment.
tools return result.get("error") directly in the response payload. AWS SDK errors can contain region names, account IDs, ARNs, or internal resource identifiers — all of which get forwarded to whoever reads the tool result. Sanitize or wrap errors before surfacing: return a generic message and log the full AWS error server-side only.
There was a problem hiding this comment.
Done. I've updated the code to sanitize AWS SDK errors by returning a generic message to the tool output while logging the full error serverside. I also added a warning log when multiple instances are returned to address your second point
| "error": "No RDS instance found with the given identifier.", | ||
| } | ||
|
|
||
| instance = instances[0] |
There was a problem hiding this comment.
instances[0] silently discards any additional results. The AWS DescribeDBInstances API can theoretically return multiple records (e.g. in edge cases with cluster members). If more than one is returned, the caller gets no indication. At minimum, add a log warning when len(instances) > 1.
There was a problem hiding this comment.
Done. I've updated the logic in commit 46a44a4 to address both points I sanitized the AWS SDK error messages to prevent leaking sensitive info and added a warning log in case multiple instances are returned as you suggested. Ready for another look!
- Replace raw AWS error forwarding with generic messages in both RDSDescribeInstanceTool and RDSEventsTool; full errors are logged server-side only to prevent leaking account IDs, ARNs, or regions. - Add warning log when DescribeDBInstances returns more than one instance, since only the first is used. - Update test assertions to match the new sanitized error messages.
Verifies that when DescribeDBInstances returns >1 record, the tool uses the first instance and emits a warning log.
|
Done i fixed those issues u can test it
…On Tue, 5 May 2026 at 8:59 AM Devesh ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In app/tools/RDSDescribeInstanceTool/__init__.py
<#1287 (comment)>
:
> + region,
+ )
+
+ result = execute_aws_sdk_call(
+ service_name="rds",
+ operation_name="describe_db_instances",
+ parameters={"DBInstanceIdentifier": db_instance_identifier},
+ region=region,
+ )
+
+ if not result.get("success"):
+ return {
+ "source": "rds",
+ "available": False,
+ "db_instance_identifier": db_instance_identifier,
+ "error": result.get("error"),
tools return result.get("error") directly in the response payload. AWS SDK
errors can contain region names, account IDs, ARNs, or internal resource
identifiers — all of which get forwarded to whoever reads the tool result.
Sanitize or wrap errors before surfacing: return a generic message and log
the full AWS error server-side only.
------------------------------
In app/tools/RDSDescribeInstanceTool/__init__.py
<#1287 (comment)>
:
> + "source": "rds",
+ "available": False,
+ "db_instance_identifier": db_instance_identifier,
+ "error": result.get("error"),
+ }
+
+ instances = (result.get("data") or {}).get("DBInstances") or []
+ if not instances:
+ return {
+ "source": "rds",
+ "available": False,
+ "db_instance_identifier": db_instance_identifier,
+ "error": "No RDS instance found with the given identifier.",
+ }
+
+ instance = instances[0]
instances[0] silently discards any additional results. The AWS
DescribeDBInstances API can theoretically return multiple records (e.g. in
edge cases with cluster members). If more than one is returned, the caller
gets no indication. At minimum, add a log warning when len(instances) > 1.
—
Reply to this email directly, view it on GitHub
<#1287 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BL4W3WOOWVF2OJXCN26WN534ZF7NXAVCNFSM6AAAAACYQ7PPT2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DEMRVGYYDCMJXGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
@cerencamkiran 👀 for reviews |
|
@greptileai review |
|
|
||
| def rds_config_from_env() -> RDSConfig | None: | ||
| """Load an RDS config from env vars.""" | ||
| db_id = env_str("RDS_DB_INSTANCE_IDENTIFIER") | ||
| if not db_id: | ||
| return None | ||
| return build_rds_config( | ||
| { | ||
| "db_instance_identifier": db_id, | ||
| "region": env_str("AWS_REGION") or env_str("RDS_REGION") or DEFAULT_RDS_REGION, | ||
| } | ||
| ) | ||
|
|
||
|
|
||
| def rds_is_available(sources: dict[str, dict]) -> bool: |
There was a problem hiding this comment.
RDS integration never wired into the discovery pipeline
rds_config_from_env() is defined and tested but never called in app/integrations/_catalog_impl.py::load_env_integrations(). Every other env-based integration (airflow, rabbitmq, betterstack, etc.) has a corresponding block in that function; RDS has none. As a result, users who set RDS_DB_INSTANCE_IDENTIFIER + AWS_REGION (the documented setup path) will never have an "rds" key in resolved_integrations.
The second missing piece is app/nodes/plan_actions/detect_sources.py. That node transfers entries from resolved_integrations into the sources dict that rds_is_available reads — there is no "rds" block there either. Even if load_env_integrations were fixed, sources["rds"] would still never be set, so rds_is_available always returns False and neither tool is ever offered to the planner.
The third gap is app/integrations/_catalog_impl.py::_classify_service_instance. When an RDS integration is stored in the remote store, the function falls through to the generic handler (return {"credentials": credentials, "integration_id": record_id}, key), which nests fields under a "credentials" key. rds_is_available then looks for sources["rds"].get("db_instance_identifier") — which resolves to None — so the remote-store path is also broken.
All three pieces are needed:
# 1. In load_env_integrations() (_catalog_impl.py)
rds_config = rds_config_from_env()
if rds_config is not None and rds_config.is_configured:
integrations.append(_active_env_record("rds", rds_config.model_dump(exclude={"integration_id"})))
# 2. In _classify_service_instance() (_catalog_impl.py)
if key == "rds":
try:
rds_config = build_rds_config({"db_instance_identifier": credentials.get("db_instance_identifier", ""), "region": credentials.get("region", DEFAULT_RDS_REGION)})
except Exception:
return None, None
if rds_config.is_configured:
return {**rds_config.model_dump(), "integration_id": record_id}, "rds"
return None, None
# 3. In detect_sources.py
rds_int = (resolved_integrations or {}).get("rds")
if rds_int and str(rds_int.get("db_instance_identifier", "")).strip():
sources["rds"] = {
"db_instance_identifier": str(rds_int.get("db_instance_identifier", "")).strip(),
"region": str(rds_int.get("region") or DEFAULT_RDS_REGION).strip(),
}There was a problem hiding this comment.
This matches the main thing I wanted to double-check as well.
The tools and helper layer look fine, but if RDS is not registered in load_env_integrations() and not transferred into sources in detect_sources.py, the documented env setup path will not actually expose the RDS tools to the planner.
I think this should be addressed before merge, ideally with a small regression test covering env-backed RDS discovery → resolved integrations → sources containing rds.
Concretely, it seems we’re missing:
- wiring in
load_env_integrations()to register the env-backed RDS config - classification in
_classify_service_instance()for remote-store configs - propagation into
sourcesindetect_sources.py
There was a problem hiding this comment.
Done! I've wired RDS into the discovery pipeline
There was a problem hiding this comment.
I have implemented the missing wiring in load_env_integrations, _classify_service_instance, and detect_sources.py. I also added a dedicated regression test test_rds_env_discovery_to_sources_pipeline that validates the entire discovery flow from environment variables to the planner's sources dictionary. Everything is verified and running as expected in my deployment. Ready for a final review!
…ection
Reviewer found three real wiring gaps that meant the documented setup
path never reached the planner. The integration helpers were correct but
nothing else in the pipeline knew about RDS:
- app/integrations/_catalog_impl.py::load_env_integrations now
registers an "rds" record when rds_config_from_env returns a
configured instance. Previously env-set RDS_DB_INSTANCE_IDENTIFIER
was a no-op.
- app/integrations/_catalog_impl.py::_classify_service_instance gains
an "rds" branch that returns a flat shape ({db_instance_identifier,
region, integration_id}) instead of falling through to the generic
handler that nests fields under "credentials" — which made
rds_is_available always return False for store-backed configs.
- app/nodes/plan_actions/detect_sources.py now copies the resolved
rds integration into sources["rds"], mirroring the rabbitmq/mariadb
blocks. Without this, even a correctly-resolved integration would
not be visible to rds_is_available.
Adds tests/integrations/test_rds.py::test_rds_env_discovery_to_sources_pipeline
which walks the full env -> load_env_integrations -> _classify_service_instance
-> detect_sources path end to end and asserts each stage produces the
expected shape, so this regression cannot reappear silently.
|
Done✅
…On Tue, 5 May 2026 at 1:57 PM Ceren Camkiran ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In app/integrations/rds.py
<#1287 (comment)>
:
> +
+def rds_config_from_env() -> RDSConfig | None:
+ """Load an RDS config from env vars."""
+ db_id = env_str("RDS_DB_INSTANCE_IDENTIFIER")
+ if not db_id:
+ return None
+ return build_rds_config(
+ {
+ "db_instance_identifier": db_id,
+ "region": env_str("AWS_REGION") or env_str("RDS_REGION") or DEFAULT_RDS_REGION,
+ }
+ )
+
+
+def rds_is_available(sources: dict[str, dict]) -> bool:
This matches the main thing I wanted to double-check as well.
The tools and helper layer look fine, but if RDS is not registered in
load_env_integrations() and not transferred into sources in
detect_sources.py, the documented env setup path will not actually expose
the RDS tools to the planner.
I think this should be addressed before merge, ideally with a small
regression test covering env-backed RDS discovery → resolved integrations →
sources containing rds.
Concretely, it seems we’re missing:
- wiring in load_env_integrations() to register the env-backed RDS
config
- classification in _classify_service_instance() for remote-store
configs
- propagation into sources in detect_sources.py
—
Reply to this email directly, view it on GitHub
<#1287 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BL4W3WOUNQXX3CXAAHNWNB34ZHCIPAVCNFSM6AAAAACYQ7PPT2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DEMRXGMZTKNRWGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Adds six targeted tests that isolate each of the gaps the reviewer flagged so any regression in one stage fails its own test rather than bleeding into the end-to-end pipeline test: Gap Tracer-Cloud#1 — load_env_integrations: test_load_env_integrations_skips_rds_when_db_id_missing Gap Tracer-Cloud#2 — _classify_service_instance: test_classify_service_instance_rds_remote_store_returns_flat_shape test_classify_service_instance_rds_skips_when_db_id_missing Gap Tracer-Cloud#3 — detect_sources: test_detect_sources_propagates_rds_into_sources test_detect_sources_skips_rds_when_not_in_resolved_integrations test_detect_sources_skips_rds_when_db_id_blank The remote-store classifier test asserts the absence of a "credentials" key in the returned dict, which is the exact failure mode the reviewer called out (generic fallback nests fields and silently breaks rds_is_available). 25 RDS tests pass; mypy, lint, format clean.
|
🎲 Researchers are baffled. @mazenessam77 opened a PR, got it reviewed without drama, and merged clean. This violates known laws of open source. 🔬 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |
|
here we goooo |

Closes #125
Fixes #125
Describe the changes you have made in this PR -
This PR adds AWS RDS as a first-class integration source so the agent can investigate RDS instance health and recent events during incidents.
app/integrations/rds.py— config builder, env-based config,rds_is_availablesource detector, andrds_extract_paramsto normalize tool inputs. Mirrors the shape used by other AWS integrations.app/tools/RDSDescribeInstanceTool/— read-only tool that callsrds:DescribeDBInstancesand returns status, engine, version, instance class, Multi-AZ, networking, storage, and backup config.app/tools/RDSEventsTool/— read-only tool that callsrds:DescribeEventsto surface failovers, maintenance, parameter changes, and backup events around an incident window.app/types/evidence.py— adds"rds"to theEvidenceSourceliteral. Without this, the@tool(source="rds", ...)decorator on both new tools failsmypy(no matching overload).tests/integrations/test_rds.py(4) andtests/tools/test_rds_tools.py(6); 14 new tests total, all passing.Both tools route through the existing
app/services/aws_sdk_client.py, which enforces a read-only boto3 allowlist — RDS gets the same safety treatment as the existing CloudWatch, EKS, and Lambda tools.Demo/Screenshot for feature changes and bug fixes -
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
The problem: the agent had no way to investigate RDS during incidents — there was an
app/integrations/rds.pyskeleton with 0% coverage but no tools wired up.I followed the pattern already established for AWS services in the repo (CloudWatch, EKS, Lambda):
app/integrations/rds.py) — defines what an RDS source looks like in the integration store, when it's "available", and how to extract tool params from the source dict. This is what the planner uses to decide whether to call RDS tools.app/tools/RDSDescribeInstanceTool/,app/tools/RDSEventsTool/) — thin wrappers aroundexecute_aws_sdk_call(...)so I get the read-only allowlist, error handling, and credential management for free. Each tool returns a flat dict matching the project's tool-output convention."rds"toEvidenceSourcewas needed because@tool(source=...)is typed against that literal. I considered using the existing"aws_sdk"source instead, but that would have broken source-based filtering downstream and mixed RDS evidence into a generic bucket.Alternative considered: rolling a dedicated
RDSClientunderapp/services/rds/like the Datadog and Grafana clients. Rejected because RDS only needs two boto3 calls — adding a client class would be over-abstraction. Theaws_sdk_clientpattern is the right fit and matches what CloudWatch/EKS/Lambda do.Checklist before requesting a review