Skip to content

feat(rds): RDS integration with describe instance and events tools#1287

Merged
muddlebee merged 8 commits intoTracer-Cloud:mainfrom
mazenessam77:feat/rds-integration
May 5, 2026
Merged

feat(rds): RDS integration with describe instance and events tools#1287
muddlebee merged 8 commits intoTracer-Cloud:mainfrom
mazenessam77:feat/rds-integration

Conversation

@mazenessam77
Copy link
Copy Markdown
Contributor

  • Add app/integrations/rds.py for source detection and param extraction
  • Add RDSDescribeInstanceTool for instance status and configuration
  • Add RDSEventsTool for failover, maintenance, and parameter events
  • Use shared aws_sdk_client read-only allowlist for safety
  • Add unit tests for integration helpers and both tools
  • Add "rds" to EvidenceSource literal in app/types/evidence.py

Closes #125

Fixes #125

Describe the changes you have made in this PR -

This PR adds AWS RDS as a first-class integration source so the agent can investigate RDS instance health and recent events during incidents.

  • app/integrations/rds.py — config builder, env-based config, rds_is_available source detector, and rds_extract_params to normalize tool inputs. Mirrors the shape used by other AWS integrations.
  • app/tools/RDSDescribeInstanceTool/ — read-only tool that calls rds:DescribeDBInstances and returns status, engine, version, instance class, Multi-AZ, networking, storage, and backup config.
  • app/tools/RDSEventsTool/ — read-only tool that calls rds:DescribeEvents to surface failovers, maintenance, parameter changes, and backup events around an incident window.
  • app/types/evidence.py — adds "rds" to the EvidenceSource literal. Without this, the @tool(source="rds", ...) decorator on both new tools fails mypy (no matching overload).
  • Teststests/integrations/test_rds.py (4) and tests/tools/test_rds_tools.py (6); 14 new tests total, all passing.

Both tools route through the existing app/services/aws_sdk_client.py, which enforces a read-only boto3 allowlist — RDS gets the same safety treatment as the existing CloudWatch, EKS, and Lambda tools.

Demo/Screenshot for feature changes and bug fixes -

$ .venv/bin/python -m pytest tests/integrations/test_rds.py tests/tools/test_rds_tools.py
...
============================== 14 passed in 0.77s ==============================

$ .venv/bin/python -m mypy app/
Success: no issues found in 460 source files

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

The problem: the agent had no way to investigate RDS during incidents — there was an app/integrations/rds.py skeleton with 0% coverage but no tools wired up.

I followed the pattern already established for AWS services in the repo (CloudWatch, EKS, Lambda):

  1. Integration layer first (app/integrations/rds.py) — defines what an RDS source looks like in the integration store, when it's "available", and how to extract tool params from the source dict. This is what the planner uses to decide whether to call RDS tools.
  2. Tools second (app/tools/RDSDescribeInstanceTool/, app/tools/RDSEventsTool/) — thin wrappers around execute_aws_sdk_call(...) so I get the read-only allowlist, error handling, and credential management for free. Each tool returns a flat dict matching the project's tool-output convention.
  3. Type literal — adding "rds" to EvidenceSource was needed because @tool(source=...) is typed against that literal. I considered using the existing "aws_sdk" source instead, but that would have broken source-based filtering downstream and mixed RDS evidence into a generic bucket.
  4. Tests cover the happy path, "no instance found" / "no events", and AWS SDK failures, plus availability detection and param extraction from both explicit configs and env-fallback.

Alternative considered: rolling a dedicated RDSClient under app/services/rds/ like the Datadog and Grafana clients. Rejected because RDS only needs two boto3 calls — adding a client class would be over-abstraction. The aws_sdk_client pattern is the right fit and matches what CloudWatch/EKS/Lambda do.


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

- Add app/integrations/rds.py for source detection and param extraction
- Add RDSDescribeInstanceTool for instance status and configuration
- Add RDSEventsTool for failover, maintenance, and parameter events
- Use shared aws_sdk_client read-only allowlist for safety
- Add unit tests for integration helpers and both tools
- Add "rds" to EvidenceSource literal in app/types/evidence.py

Closes Tracer-Cloud#125
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

  • Adds RDSDescribeInstanceTool and RDSEventsTool — two read-only boto3 wrappers that surface instance health and recent events, both correctly routed through the existing aws_sdk_client allowlist.
  • The integration layer (app/integrations/rds.py) is well-implemented, but three pipeline wiring steps are missing: load_env_integrations() never calls rds_config_from_env(), _classify_service_instance() has no "rds" handler (so remote-store credentials stay nested and unreadable), and detect_sources.py has no "rds" block to copy resolved_integrations[\"rds\"] into the sources dict — meaning rds_is_available always returns False and neither tool is ever offered to the planner.
  • All 14 tests pass because they call tool functions directly; no test exercises the end-to-end path through load_env_integrationsclassify_integrationsdetect_sourcesrds_is_available.

Confidence Score: 3/5

Not safe to merge — the integration is functionally dead-on-arrival; neither RDS tool will be offered to the planner in any deployment.

Multiple P1 findings that together make the entire integration non-functional: load_env_integrations is not updated, _classify_service_instance has no 'rds' handler, and detect_sources never populates sources['rds']. The tools themselves are correct and all 14 tests pass, but those tests bypass the pipeline. The score is pulled below the P1 ceiling of 4 due to three compounding gaps on the critical path.

app/integrations/_catalog_impl.py (load_env_integrations + _classify_service_instance) and app/nodes/plan_actions/detect_sources.py — neither is in the diff but both must be updated before the integration can work.

Important Files Changed

Filename Overview
app/integrations/rds.py Integration helpers are well-implemented but not wired into load_env_integrations(), _classify_service_instance(), or detect_sources.py — the three missing links that would let the planner discover and activate these tools.
app/tools/RDSDescribeInstanceTool/init.py Clean implementation: correct boto3 call, defensive result unpacking, proper logging, and a warning when the API returns multiple instances.
app/tools/RDSEventsTool/init.py Correct Duration passthrough, bounded input_schema (min 1, max 20160), and clean event flattening with safe defaults.
app/types/evidence.py Correct addition of 'rds' to the EvidenceSource Literal; no other changes.
tests/integrations/test_rds.py Good unit coverage of config helpers and param extraction; all tests pass, but there are no pipeline-integration tests covering detect_sources or load_env_integrations paths.
tests/tools/test_rds_tools.py Thorough tool-layer tests: happy path, not-found, AWS failure, multi-instance warning, and empty-events cases all covered.
docs/rds.mdx Well-structured documentation with correct env var table, IAM policy, and troubleshooting section.
docs/docs.json Adds 'rds' to the docs navigation — correct and complete.

Sequence Diagram

sequenceDiagram
    participant Env as Env Vars
    participant LEI as load_env_integrations()
    participant CI as classify_integrations()
    participant DS as detect_sources()
    participant IA as rds_is_available()
    participant Tool as RDS Tools

    Env->>LEI: RDS_DB_INSTANCE_IDENTIFIER set
    Note over LEI: ❌ rds_config_from_env() never called
    LEI-->>CI: (no "rds" record)
    CI-->>DS: resolved_integrations (no "rds" key)
    Note over DS: ❌ no "rds" block in detect_sources
    DS-->>IA: sources (no "rds" key)
    IA-->>Tool: returns False ❌
    Note over Tool: Tools never offered to planner
Loading

Reviews (2): Last reviewed commit: "test(rds): add coverage for multiple-ins..." | Re-trigger Greptile

Comment thread app/integrations/rds.py Outdated
Comment thread app/integrations/rds.py Outdated
Greptile review caught two related issues in app/integrations/rds.py:

- rds_config_from_env: env_str("AWS_REGION", DEFAULT_RDS_REGION) always
  returned a non-empty string, making the RDS_REGION fallback unreachable
  (P1 dead code).
- rds_extract_params: only consulted AWS_REGION, never RDS_REGION,
  diverging from the config path (P2 inconsistency).

Both functions now resolve region as: source-supplied > AWS_REGION >
RDS_REGION > DEFAULT_RDS_REGION. Drops the now-unused os import.

Adds three tests covering the previously-uncovered fallback paths:
RDS_REGION-only in extract, RDS_REGION-only in config-from-env, and
default when neither env var is set.
- ruff check --fix reorganized imports in tests/tools/test_rds_tools.py
- ruff format reformatted app/integrations/rds.py
- New docs/rds.mdx covering setup, env vars, IAM permissions, the two
  tools, use cases, and troubleshooting. References docs/aws.mdx for
  shared AWS credential setup rather than duplicating it.
- Register the page under "Data and workflow systems" in docs/docs.json.

Documents the RDS_REGION fallback semantics so users understand the
region resolution order: source dict > AWS_REGION > RDS_REGION > default.
"source": "rds",
"available": False,
"db_instance_identifier": db_instance_identifier,
"error": result.get("error"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools return result.get("error") directly in the response payload. AWS SDK errors can contain region names, account IDs, ARNs, or internal resource identifiers — all of which get forwarded to whoever reads the tool result. Sanitize or wrap errors before surfacing: return a generic message and log the full AWS error server-side only.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've updated the code to sanitize AWS SDK errors by returning a generic message to the tool output while logging the full error serverside. I also added a warning log when multiple instances are returned to address your second point

"error": "No RDS instance found with the given identifier.",
}

instance = instances[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instances[0] silently discards any additional results. The AWS DescribeDBInstances API can theoretically return multiple records (e.g. in edge cases with cluster members). If more than one is returned, the caller gets no indication. At minimum, add a log warning when len(instances) > 1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've updated the logic in commit 46a44a4 to address both points I sanitized the AWS SDK error messages to prevent leaking sensitive info and added a warning log in case multiple instances are returned as you suggested. Ready for another look!

- Replace raw AWS error forwarding with generic messages in both
  RDSDescribeInstanceTool and RDSEventsTool; full errors are logged
  server-side only to prevent leaking account IDs, ARNs, or regions.
- Add warning log when DescribeDBInstances returns more than one
  instance, since only the first is used.
- Update test assertions to match the new sanitized error messages.
Verifies that when DescribeDBInstances returns >1 record, the tool
uses the first instance and emits a warning log.
@mazenessam77
Copy link
Copy Markdown
Contributor Author

mazenessam77 commented May 5, 2026 via email

@muddlebee
Copy link
Copy Markdown
Collaborator

@cerencamkiran 👀 for reviews

@cerencamkiran
Copy link
Copy Markdown
Collaborator

@greptileai review

Comment thread app/integrations/rds.py
Comment on lines +34 to +48

def rds_config_from_env() -> RDSConfig | None:
"""Load an RDS config from env vars."""
db_id = env_str("RDS_DB_INSTANCE_IDENTIFIER")
if not db_id:
return None
return build_rds_config(
{
"db_instance_identifier": db_id,
"region": env_str("AWS_REGION") or env_str("RDS_REGION") or DEFAULT_RDS_REGION,
}
)


def rds_is_available(sources: dict[str, dict]) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 RDS integration never wired into the discovery pipeline

rds_config_from_env() is defined and tested but never called in app/integrations/_catalog_impl.py::load_env_integrations(). Every other env-based integration (airflow, rabbitmq, betterstack, etc.) has a corresponding block in that function; RDS has none. As a result, users who set RDS_DB_INSTANCE_IDENTIFIER + AWS_REGION (the documented setup path) will never have an "rds" key in resolved_integrations.

The second missing piece is app/nodes/plan_actions/detect_sources.py. That node transfers entries from resolved_integrations into the sources dict that rds_is_available reads — there is no "rds" block there either. Even if load_env_integrations were fixed, sources["rds"] would still never be set, so rds_is_available always returns False and neither tool is ever offered to the planner.

The third gap is app/integrations/_catalog_impl.py::_classify_service_instance. When an RDS integration is stored in the remote store, the function falls through to the generic handler (return {"credentials": credentials, "integration_id": record_id}, key), which nests fields under a "credentials" key. rds_is_available then looks for sources["rds"].get("db_instance_identifier") — which resolves to None — so the remote-store path is also broken.

All three pieces are needed:

# 1. In load_env_integrations() (_catalog_impl.py)
rds_config = rds_config_from_env()
if rds_config is not None and rds_config.is_configured:
    integrations.append(_active_env_record("rds", rds_config.model_dump(exclude={"integration_id"})))

# 2. In _classify_service_instance() (_catalog_impl.py)
if key == "rds":
    try:
        rds_config = build_rds_config({"db_instance_identifier": credentials.get("db_instance_identifier", ""), "region": credentials.get("region", DEFAULT_RDS_REGION)})
    except Exception:
        return None, None
    if rds_config.is_configured:
        return {**rds_config.model_dump(), "integration_id": record_id}, "rds"
    return None, None

# 3. In detect_sources.py
rds_int = (resolved_integrations or {}).get("rds")
if rds_int and str(rds_int.get("db_instance_identifier", "")).strip():
    sources["rds"] = {
        "db_instance_identifier": str(rds_int.get("db_instance_identifier", "")).strip(),
        "region": str(rds_int.get("region") or DEFAULT_RDS_REGION).strip(),
    }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the main thing I wanted to double-check as well.

The tools and helper layer look fine, but if RDS is not registered in load_env_integrations() and not transferred into sources in detect_sources.py, the documented env setup path will not actually expose the RDS tools to the planner.

I think this should be addressed before merge, ideally with a small regression test covering env-backed RDS discovery → resolved integrations → sources containing rds.

Concretely, it seems we’re missing:

  • wiring in load_env_integrations() to register the env-backed RDS config
  • classification in _classify_service_instance() for remote-store configs
  • propagation into sources in detect_sources.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've wired RDS into the discovery pipeline

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the missing wiring in load_env_integrations, _classify_service_instance, and detect_sources.py. I also added a dedicated regression test test_rds_env_discovery_to_sources_pipeline that validates the entire discovery flow from environment variables to the planner's sources dictionary. Everything is verified and running as expected in my deployment. Ready for a final review!

…ection

Reviewer found three real wiring gaps that meant the documented setup
path never reached the planner. The integration helpers were correct but
nothing else in the pipeline knew about RDS:

- app/integrations/_catalog_impl.py::load_env_integrations now
  registers an "rds" record when rds_config_from_env returns a
  configured instance. Previously env-set RDS_DB_INSTANCE_IDENTIFIER
  was a no-op.
- app/integrations/_catalog_impl.py::_classify_service_instance gains
  an "rds" branch that returns a flat shape ({db_instance_identifier,
  region, integration_id}) instead of falling through to the generic
  handler that nests fields under "credentials" — which made
  rds_is_available always return False for store-backed configs.
- app/nodes/plan_actions/detect_sources.py now copies the resolved
  rds integration into sources["rds"], mirroring the rabbitmq/mariadb
  blocks. Without this, even a correctly-resolved integration would
  not be visible to rds_is_available.

Adds tests/integrations/test_rds.py::test_rds_env_discovery_to_sources_pipeline
which walks the full env -> load_env_integrations -> _classify_service_instance
-> detect_sources path end to end and asserts each stage produces the
expected shape, so this regression cannot reappear silently.
@mazenessam77
Copy link
Copy Markdown
Contributor Author

mazenessam77 commented May 5, 2026 via email

Adds six targeted tests that isolate each of the gaps the reviewer
flagged so any regression in one stage fails its own test rather than
bleeding into the end-to-end pipeline test:

Gap Tracer-Cloud#1 — load_env_integrations:
  test_load_env_integrations_skips_rds_when_db_id_missing

Gap Tracer-Cloud#2 — _classify_service_instance:
  test_classify_service_instance_rds_remote_store_returns_flat_shape
  test_classify_service_instance_rds_skips_when_db_id_missing

Gap Tracer-Cloud#3 — detect_sources:
  test_detect_sources_propagates_rds_into_sources
  test_detect_sources_skips_rds_when_not_in_resolved_integrations
  test_detect_sources_skips_rds_when_db_id_blank

The remote-store classifier test asserts the absence of a "credentials"
key in the returned dict, which is the exact failure mode the reviewer
called out (generic fallback nests fields and silently breaks
rds_is_available).

25 RDS tests pass; mypy, lint, format clean.
@muddlebee muddlebee merged commit fa25cd9 into Tracer-Cloud:main May 5, 2026
10 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

🎲 Researchers are baffled. @mazenessam77 opened a PR, got it reviewed without drama, and merged clean. This violates known laws of open source. 🔬


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

@mazenessam77
Copy link
Copy Markdown
Contributor Author

here we goooo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RDS Integration

5 participants