feat(rds): RDS integration with describe instance and events tools by mazenessam77 · Pull Request #1287 · Tracer-Cloud/opensre

mazenessam77 · 2026-05-05T03:24:37Z

Add app/integrations/rds.py for source detection and param extraction
Add RDSDescribeInstanceTool for instance status and configuration
Add RDSEventsTool for failover, maintenance, and parameter events
Use shared aws_sdk_client read-only allowlist for safety
Add unit tests for integration helpers and both tools
Add "rds" to EvidenceSource literal in app/types/evidence.py

Closes #125

Fixes #125

Describe the changes you have made in this PR -

This PR adds AWS RDS as a first-class integration source so the agent can investigate RDS instance health and recent events during incidents.

app/integrations/rds.py — config builder, env-based config, rds_is_available source detector, and rds_extract_params to normalize tool inputs. Mirrors the shape used by other AWS integrations.
app/tools/RDSDescribeInstanceTool/ — read-only tool that calls rds:DescribeDBInstances and returns status, engine, version, instance class, Multi-AZ, networking, storage, and backup config.
app/tools/RDSEventsTool/ — read-only tool that calls rds:DescribeEvents to surface failovers, maintenance, parameter changes, and backup events around an incident window.
app/types/evidence.py — adds "rds" to the EvidenceSource literal. Without this, the @tool(source="rds", ...) decorator on both new tools fails mypy (no matching overload).
Tests — tests/integrations/test_rds.py (4) and tests/tools/test_rds_tools.py (6); 14 new tests total, all passing.

Both tools route through the existing app/services/aws_sdk_client.py, which enforces a read-only boto3 allowlist — RDS gets the same safety treatment as the existing CloudWatch, EKS, and Lambda tools.

Demo/Screenshot for feature changes and bug fixes -

$ .venv/bin/python -m pytest tests/integrations/test_rds.py tests/tools/test_rds_tools.py
...
============================== 14 passed in 0.77s ==============================

$ .venv/bin/python -m mypy app/
Success: no issues found in 460 source files

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

The problem: the agent had no way to investigate RDS during incidents — there was an app/integrations/rds.py skeleton with 0% coverage but no tools wired up.

I followed the pattern already established for AWS services in the repo (CloudWatch, EKS, Lambda):

Integration layer first (app/integrations/rds.py) — defines what an RDS source looks like in the integration store, when it's "available", and how to extract tool params from the source dict. This is what the planner uses to decide whether to call RDS tools.
Tools second (app/tools/RDSDescribeInstanceTool/, app/tools/RDSEventsTool/) — thin wrappers around execute_aws_sdk_call(...) so I get the read-only allowlist, error handling, and credential management for free. Each tool returns a flat dict matching the project's tool-output convention.
Type literal — adding "rds" to EvidenceSource was needed because @tool(source=...) is typed against that literal. I considered using the existing "aws_sdk" source instead, but that would have broken source-based filtering downstream and mixed RDS evidence into a generic bucket.
Tests cover the happy path, "no instance found" / "no events", and AWS SDK failures, plus availability detection and param extraction from both explicit configs and env-fallback.

Alternative considered: rolling a dedicated RDSClient under app/services/rds/ like the Datadog and Grafana clients. Rejected because RDS only needs two boto3 calls — adding a client class would be over-abstraction. The aws_sdk_client pattern is the right fit and matches what CloudWatch/EKS/Lambda do.

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

- Add app/integrations/rds.py for source detection and param extraction - Add RDSDescribeInstanceTool for instance status and configuration - Add RDSEventsTool for failover, maintenance, and parameter events - Use shared aws_sdk_client read-only allowlist for safety - Add unit tests for integration helpers and both tools - Add "rds" to EvidenceSource literal in app/types/evidence.py Closes Tracer-Cloud#125

greptile-apps · 2026-05-05T03:27:21Z

Greptile Summary

Adds RDSDescribeInstanceTool and RDSEventsTool — two read-only boto3 wrappers that surface instance health and recent events, both correctly routed through the existing aws_sdk_client allowlist.
The integration layer (app/integrations/rds.py) is well-implemented, but three pipeline wiring steps are missing: load_env_integrations() never calls rds_config_from_env(), _classify_service_instance() has no "rds" handler (so remote-store credentials stay nested and unreadable), and detect_sources.py has no "rds" block to copy resolved_integrations[\"rds\"] into the sources dict — meaning rds_is_available always returns False and neither tool is ever offered to the planner.
All 14 tests pass because they call tool functions directly; no test exercises the end-to-end path through load_env_integrations → classify_integrations → detect_sources → rds_is_available.

Confidence Score: 3/5

Not safe to merge — the integration is functionally dead-on-arrival; neither RDS tool will be offered to the planner in any deployment.

Multiple P1 findings that together make the entire integration non-functional: load_env_integrations is not updated, _classify_service_instance has no 'rds' handler, and detect_sources never populates sources['rds']. The tools themselves are correct and all 14 tests pass, but those tests bypass the pipeline. The score is pulled below the P1 ceiling of 4 due to three compounding gaps on the critical path.

app/integrations/_catalog_impl.py (load_env_integrations + _classify_service_instance) and app/nodes/plan_actions/detect_sources.py — neither is in the diff but both must be updated before the integration can work.

Important Files Changed

Filename	Overview
app/integrations/rds.py	Integration helpers are well-implemented but not wired into load_env_integrations(), _classify_service_instance(), or detect_sources.py — the three missing links that would let the planner discover and activate these tools.
app/tools/RDSDescribeInstanceTool/init.py	Clean implementation: correct boto3 call, defensive result unpacking, proper logging, and a warning when the API returns multiple instances.
app/tools/RDSEventsTool/init.py	Correct Duration passthrough, bounded input_schema (min 1, max 20160), and clean event flattening with safe defaults.
app/types/evidence.py	Correct addition of 'rds' to the EvidenceSource Literal; no other changes.
tests/integrations/test_rds.py	Good unit coverage of config helpers and param extraction; all tests pass, but there are no pipeline-integration tests covering detect_sources or load_env_integrations paths.
tests/tools/test_rds_tools.py	Thorough tool-layer tests: happy path, not-found, AWS failure, multi-instance warning, and empty-events cases all covered.
docs/rds.mdx	Well-structured documentation with correct env var table, IAM policy, and troubleshooting section.
docs/docs.json	Adds 'rds' to the docs navigation — correct and complete.

Sequence Diagram

sequenceDiagram
    participant Env as Env Vars
    participant LEI as load_env_integrations()
    participant CI as classify_integrations()
    participant DS as detect_sources()
    participant IA as rds_is_available()
    participant Tool as RDS Tools

    Env->>LEI: RDS_DB_INSTANCE_IDENTIFIER set
    Note over LEI: ❌ rds_config_from_env() never called
    LEI-->>CI: (no "rds" record)
    CI-->>DS: resolved_integrations (no "rds" key)
    Note over DS: ❌ no "rds" block in detect_sources
    DS-->>IA: sources (no "rds" key)
    IA-->>Tool: returns False ❌
    Note over Tool: Tools never offered to planner

_{Reviews (2): Last reviewed commit: "test(rds): add coverage for multiple-ins..." | Re-trigger Greptile}

Greptile review caught two related issues in app/integrations/rds.py: - rds_config_from_env: env_str("AWS_REGION", DEFAULT_RDS_REGION) always returned a non-empty string, making the RDS_REGION fallback unreachable (P1 dead code). - rds_extract_params: only consulted AWS_REGION, never RDS_REGION, diverging from the config path (P2 inconsistency). Both functions now resolve region as: source-supplied > AWS_REGION > RDS_REGION > DEFAULT_RDS_REGION. Drops the now-unused os import. Adds three tests covering the previously-uncovered fallback paths: RDS_REGION-only in extract, RDS_REGION-only in config-from-env, and default when neither env var is set.

- ruff check --fix reorganized imports in tests/tools/test_rds_tools.py - ruff format reformatted app/integrations/rds.py

- New docs/rds.mdx covering setup, env vars, IAM permissions, the two tools, use cases, and troubleshooting. References docs/aws.mdx for shared AWS credential setup rather than duplicating it. - Register the page under "Data and workflow systems" in docs/docs.json. Documents the RDS_REGION fallback semantics so users understand the region resolution order: source dict > AWS_REGION > RDS_REGION > default.

Devesh36 · 2026-05-05T05:57:11Z

+            "source": "rds",
+            "available": False,
+            "db_instance_identifier": db_instance_identifier,
+            "error": result.get("error"),


tools return result.get("error") directly in the response payload. AWS SDK errors can contain region names, account IDs, ARNs, or internal resource identifiers — all of which get forwarded to whoever reads the tool result. Sanitize or wrap errors before surfacing: return a generic message and log the full AWS error server-side only.

Done. I've updated the code to sanitize AWS SDK errors by returning a generic message to the tool output while logging the full error serverside. I also added a warning log when multiple instances are returned to address your second point

Devesh36 · 2026-05-05T05:58:21Z

+            "error": "No RDS instance found with the given identifier.",
+        }
+
+    instance = instances[0]


instances[0] silently discards any additional results. The AWS DescribeDBInstances API can theoretically return multiple records (e.g. in edge cases with cluster members). If more than one is returned, the caller gets no indication. At minimum, add a log warning when len(instances) > 1.

Done. I've updated the logic in commit 46a44a4 to address both points I sanitized the AWS SDK error messages to prevent leaking sensitive info and added a warning log in case multiple instances are returned as you suggested. Ready for another look!

- Replace raw AWS error forwarding with generic messages in both RDSDescribeInstanceTool and RDSEventsTool; full errors are logged server-side only to prevent leaking account IDs, ARNs, or regions. - Add warning log when DescribeDBInstances returns more than one instance, since only the first is used. - Update test assertions to match the new sanitized error messages.

Verifies that when DescribeDBInstances returns >1 record, the tool uses the first instance and emits a warning log.

mazenessam77 · 2026-05-05T07:15:10Z

Done i fixed those issues u can test it

…

On Tue, 5 May 2026 at 8:59 AM Devesh ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In app/tools/RDSDescribeInstanceTool/__init__.py <#1287 (comment)> : > + region, + ) + + result = execute_aws_sdk_call( + service_name="rds", + operation_name="describe_db_instances", + parameters={"DBInstanceIdentifier": db_instance_identifier}, + region=region, + ) + + if not result.get("success"): + return { + "source": "rds", + "available": False, + "db_instance_identifier": db_instance_identifier, + "error": result.get("error"), tools return result.get("error") directly in the response payload. AWS SDK errors can contain region names, account IDs, ARNs, or internal resource identifiers — all of which get forwarded to whoever reads the tool result. Sanitize or wrap errors before surfacing: return a generic message and log the full AWS error server-side only. ------------------------------ In app/tools/RDSDescribeInstanceTool/__init__.py <#1287 (comment)> : > + "source": "rds", + "available": False, + "db_instance_identifier": db_instance_identifier, + "error": result.get("error"), + } + + instances = (result.get("data") or {}).get("DBInstances") or [] + if not instances: + return { + "source": "rds", + "available": False, + "db_instance_identifier": db_instance_identifier, + "error": "No RDS instance found with the given identifier.", + } + + instance = instances[0] instances[0] silently discards any additional results. The AWS DescribeDBInstances API can theoretically return multiple records (e.g. in edge cases with cluster members). If more than one is returned, the caller gets no indication. At minimum, add a log warning when len(instances) > 1. — Reply to this email directly, view it on GitHub <#1287 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BL4W3WOOWVF2OJXCN26WN534ZF7NXAVCNFSM6AAAAACYQ7PPT2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DEMRVGYYDCMJXGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

muddlebee · 2026-05-05T10:27:25Z

@cerencamkiran 👀 for reviews

cerencamkiran · 2026-05-05T10:46:49Z

@greptileai review

greptile-apps · 2026-05-05T10:52:39Z

+
+def rds_config_from_env() -> RDSConfig | None:
+    """Load an RDS config from env vars."""
+    db_id = env_str("RDS_DB_INSTANCE_IDENTIFIER")
+    if not db_id:
+        return None
+    return build_rds_config(
+        {
+            "db_instance_identifier": db_id,
+            "region": env_str("AWS_REGION") or env_str("RDS_REGION") or DEFAULT_RDS_REGION,
+        }
+    )
+
+
+def rds_is_available(sources: dict[str, dict]) -> bool:


RDS integration never wired into the discovery pipeline

rds_config_from_env() is defined and tested but never called in app/integrations/_catalog_impl.py::load_env_integrations(). Every other env-based integration (airflow, rabbitmq, betterstack, etc.) has a corresponding block in that function; RDS has none. As a result, users who set RDS_DB_INSTANCE_IDENTIFIER + AWS_REGION (the documented setup path) will never have an "rds" key in resolved_integrations.

The second missing piece is app/nodes/plan_actions/detect_sources.py. That node transfers entries from resolved_integrations into the sources dict that rds_is_available reads — there is no "rds" block there either. Even if load_env_integrations were fixed, sources["rds"] would still never be set, so rds_is_available always returns False and neither tool is ever offered to the planner.

The third gap is app/integrations/_catalog_impl.py::_classify_service_instance. When an RDS integration is stored in the remote store, the function falls through to the generic handler (return {"credentials": credentials, "integration_id": record_id}, key), which nests fields under a "credentials" key. rds_is_available then looks for sources["rds"].get("db_instance_identifier") — which resolves to None — so the remote-store path is also broken.

All three pieces are needed:

# 1. In load_env_integrations() (_catalog_impl.py) rds_config = rds_config_from_env() if rds_config is not None and rds_config.is_configured: integrations.append(_active_env_record("rds", rds_config.model_dump(exclude={"integration_id"}))) # 2. In _classify_service_instance() (_catalog_impl.py) if key == "rds": try: rds_config = build_rds_config({"db_instance_identifier": credentials.get("db_instance_identifier", ""), "region": credentials.get("region", DEFAULT_RDS_REGION)}) except Exception: return None, None if rds_config.is_configured: return {**rds_config.model_dump(), "integration_id": record_id}, "rds" return None, None # 3. In detect_sources.py rds_int = (resolved_integrations or {}).get("rds") if rds_int and str(rds_int.get("db_instance_identifier", "")).strip(): sources["rds"] = { "db_instance_identifier": str(rds_int.get("db_instance_identifier", "")).strip(), "region": str(rds_int.get("region") or DEFAULT_RDS_REGION).strip(), }

This matches the main thing I wanted to double-check as well.

The tools and helper layer look fine, but if RDS is not registered in load_env_integrations() and not transferred into sources in detect_sources.py, the documented env setup path will not actually expose the RDS tools to the planner.

I think this should be addressed before merge, ideally with a small regression test covering env-backed RDS discovery → resolved integrations → sources containing rds.

Concretely, it seems we’re missing:

wiring in load_env_integrations() to register the env-backed RDS config

classification in _classify_service_instance() for remote-store configs

propagation into sources in detect_sources.py

Done! I've wired RDS into the discovery pipeline

I have implemented the missing wiring in load_env_integrations, _classify_service_instance, and detect_sources.py. I also added a dedicated regression test test_rds_env_discovery_to_sources_pipeline that validates the entire discovery flow from environment variables to the planner's sources dictionary. Everything is verified and running as expected in my deployment. Ready for a final review!

…ection Reviewer found three real wiring gaps that meant the documented setup path never reached the planner. The integration helpers were correct but nothing else in the pipeline knew about RDS: - app/integrations/_catalog_impl.py::load_env_integrations now registers an "rds" record when rds_config_from_env returns a configured instance. Previously env-set RDS_DB_INSTANCE_IDENTIFIER was a no-op. - app/integrations/_catalog_impl.py::_classify_service_instance gains an "rds" branch that returns a flat shape ({db_instance_identifier, region, integration_id}) instead of falling through to the generic handler that nests fields under "credentials" — which made rds_is_available always return False for store-backed configs. - app/nodes/plan_actions/detect_sources.py now copies the resolved rds integration into sources["rds"], mirroring the rabbitmq/mariadb blocks. Without this, even a correctly-resolved integration would not be visible to rds_is_available. Adds tests/integrations/test_rds.py::test_rds_env_discovery_to_sources_pipeline which walks the full env -> load_env_integrations -> _classify_service_instance -> detect_sources path end to end and asserts each stage produces the expected shape, so this regression cannot reappear silently.

mazenessam77 · 2026-05-05T13:11:56Z

Done✅

…

On Tue, 5 May 2026 at 1:57 PM Ceren Camkiran ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In app/integrations/rds.py <#1287 (comment)> : > + +def rds_config_from_env() -> RDSConfig | None: + """Load an RDS config from env vars.""" + db_id = env_str("RDS_DB_INSTANCE_IDENTIFIER") + if not db_id: + return None + return build_rds_config( + { + "db_instance_identifier": db_id, + "region": env_str("AWS_REGION") or env_str("RDS_REGION") or DEFAULT_RDS_REGION, + } + ) + + +def rds_is_available(sources: dict[str, dict]) -> bool: This matches the main thing I wanted to double-check as well. The tools and helper layer look fine, but if RDS is not registered in load_env_integrations() and not transferred into sources in detect_sources.py, the documented env setup path will not actually expose the RDS tools to the planner. I think this should be addressed before merge, ideally with a small regression test covering env-backed RDS discovery → resolved integrations → sources containing rds. Concretely, it seems we’re missing: - wiring in load_env_integrations() to register the env-backed RDS config - classification in _classify_service_instance() for remote-store configs - propagation into sources in detect_sources.py — Reply to this email directly, view it on GitHub <#1287 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BL4W3WOUNQXX3CXAAHNWNB34ZHCIPAVCNFSM6AAAAACYQ7PPT2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DEMRXGMZTKNRWGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Adds six targeted tests that isolate each of the gaps the reviewer flagged so any regression in one stage fails its own test rather than bleeding into the end-to-end pipeline test: Gap Tracer-Cloud#1 — load_env_integrations: test_load_env_integrations_skips_rds_when_db_id_missing Gap Tracer-Cloud#2 — _classify_service_instance: test_classify_service_instance_rds_remote_store_returns_flat_shape test_classify_service_instance_rds_skips_when_db_id_missing Gap Tracer-Cloud#3 — detect_sources: test_detect_sources_propagates_rds_into_sources test_detect_sources_skips_rds_when_not_in_resolved_integrations test_detect_sources_skips_rds_when_db_id_blank The remote-store classifier test asserts the absence of a "credentials" key in the returned dict, which is the exact failure mode the reviewer called out (generic fallback nests fields and silently breaks rds_is_available). 25 RDS tests pass; mypy, lint, format clean.

github-actions · 2026-05-05T17:06:38Z

🎲 Researchers are baffled. @mazenessam77 opened a PR, got it reviewed without drama, and merged clean. This violates known laws of open source. 🔬

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

mazenessam77 · 2026-05-05T17:35:49Z

here we goooo

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

Comment thread app/integrations/rds.py Outdated

Comment thread app/integrations/rds.py Outdated

igor-berger added 3 commits May 5, 2026 06:31

chore(rds): apply ruff lint and format fixes

9a575a1

- ruff check --fix reorganized imports in tests/tools/test_rds_tools.py - ruff format reformatted app/integrations/rds.py

Devesh36 reviewed May 5, 2026

View reviewed changes

igor-berger added 2 commits May 5, 2026 09:13

test(rds): add coverage for multiple-instances warning path

4c78484

Verifies that when DescribeDBInstances returns >1 record, the tool uses the first instance and emits a warning log.

muddlebee added the pending triage label May 5, 2026

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

muddlebee merged commit fa25cd9 into Tracer-Cloud:main May 5, 2026
10 checks passed

Conversation

mazenessam77 commented May 5, 2026

Describe the changes you have made in this PR -

Demo/Screenshot for feature changes and bug fixes -

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

greptile-apps Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Devesh36 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mazenessam77 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Devesh36 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mazenessam77 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mazenessam77 commented May 5, 2026 via email

Uh oh!

muddlebee commented May 5, 2026

Uh oh!

cerencamkiran commented May 5, 2026

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

cerencamkiran May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mazenessam77 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mazenessam77 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mazenessam77 commented May 5, 2026 via email

Uh oh!

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

mazenessam77 commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

greptile-apps Bot commented May 5, 2026 •

edited

Loading