fix(airflow): correct tool source, StrictConfigModel, surfaces, exception-recovery test#1077
fix(airflow): correct tool source, StrictConfigModel, surfaces, exception-recovery test#1077yashksaini-coder merged 3 commits intomainfrom
Conversation
…s, test recovery path - Fix P1: change source='tracer_web' → source='airflow' on all 3 Airflow @tool decorators so the prioritizer's +2 score bump fires correctly when 'airflow' appears in detected sources (prioritization.py:58) - Add 'airflow' to EvidenceSource Literal in app/types/evidence.py - Fix P2: switch AirflowConfig from BaseModel → StrictConfigModel (extra='forbid') to match the rest of the integration config models - Add surfaces=('investigation', 'chat') to all 3 tools, matching the convention used by other integrations (PostgreSQL, MySQL, GitHub, etc.) - Add P2 test: test_get_recent_airflow_failures_partial_run_error_preserves_evidence covers the exception-recovery loop in get_recent_airflow_failures — verifies evidence from a successful run is preserved when another run's task-instance fetch raises HTTPStatusError Follows up on #570. Co-authored-by: Copilot <[email protected]>
Greptile SummaryThis PR fixes the root cause of Airflow tools never receiving the +2 prioritization score boost: all three tools were registered with Confidence Score: 5/5Safe to merge — all changes are targeted bug fixes with no regressions. All four changes are correct and complete: the source field is fixed on all three tools, EvidenceSource is updated, StrictConfigModel migration matches every other integration config, and the new test exercises a previously-untested code path. No P0 or P1 issues found. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant P as Prioritizer
participant R as ToolRegistry
participant T as TracerAirflowDAGTool
participant C as AirflowConfig (StrictConfigModel)
participant A as Airflow API
P->>R: get_available_actions()
R->>T: load tools (source="airflow")
P->>P: score += 2 (source "airflow" matches incident sources)
P->>T: get_recent_airflow_failures(config, dag_id)
T->>C: build_airflow_config(raw) — extra fields rejected early
T->>A: GET /dags/{dag_id}/dagRuns
A-->>T: dag_runs[]
loop for each dag_run
T->>A: GET /dagRuns/{run_id}/taskInstances
alt HTTP 500
A-->>T: HTTPStatusError
T->>T: log warning, continue (evidence preserved)
else OK
A-->>T: task_instances[]
T->>T: filter failed/up_for_retry states
end
end
T-->>P: evidence[]
Reviews (1): Last reviewed commit: "fix(airflow): correct tool source, use S..." | Re-trigger Greptile |
There was a problem hiding this comment.
Pull request overview
Follow-up to prior Airflow integration work to ensure Airflow tools are correctly prioritized and consistently registered, while tightening config validation and adding missing exception-recovery coverage.
Changes:
- Corrected Airflow tool metadata (
source="airflow") so action prioritization properly scores these tools for Airflow incidents. - Updated
AirflowConfigto inherit fromStrictConfigModel(fail-fast on unknown config fields). - Added a unit test that exercises per-run exception recovery in
get_recent_airflow_failuresto ensure partial evidence is preserved.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
tests/integrations/test_airflow.py |
Adds a regression test covering partial API failure behavior while preserving previously collected evidence. |
app/types/evidence.py |
Extends the canonical EvidenceSource set with "airflow" so tooling/source logic can recognize it. |
app/tools/TracerAirflowDAGTool/__init__.py |
Fixes tool source to "airflow" and adds surfaces=("investigation","chat") for consistency and prioritization. |
app/integrations/airflow.py |
Switches AirflowConfig to StrictConfigModel to forbid unknown fields and normalize inputs consistently. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
search_logs() returns {success, logs, total} directly — not a nested
{logs: {success, logs}, monitors, events} shape like fetch_all() does.
Three tests were written against the wrong shape:
- test_search_logs_success: removed monitors/events assertions, fixed
result['logs'][0]['message'] (was result['logs']['logs'][0]['message'])
- test_search_logs_empty_data: fixed result['logs'] == [] (was
result['logs']['logs']), removed incorrect monitors/events checks
- test_search_logs_http_error: fixed result['error'] (was
result['logs']['error']), removed unnecessary mock_instance.get stub
All 19 tests in test_datadog_client.py now pass.
Co-authored-by: Copilot <[email protected]>
…t.py from main The file was removed from main (cb37b6e). Accepting the upstream deletion. Co-authored-by: Copilot <[email protected]>
|
Nice @yashksaini-coder, I was about to follow up on this as well thanks for taking care of it and adding the extra cleanup and tests! |
|
💜 One more reason the project grows. Thanks @yashksaini-coder — your contribution just landed! 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Follow-up to #570 + fixes pre-existing test failures
Addresses all remaining issues from the post-merge review comment: #570 (comment)
Changes
P1 — Fix
source="airflow"on all 3 tools (was"tracer_web")Root cause:
prioritization.py:58awards a+2score whenaction.source in sources. For an Airflow incidentsourcescontains"airflow", but all 3 tools hadsource="tracer_web"— so they scored 0 instead of 2 during prioritization.Also added
"airflow"to theEvidenceSourceLiteral (app/types/evidence.py) which is validated at import time byBaseTool.__init_subclass__().P2 —
AirflowConfig(BaseModel)→AirflowConfig(StrictConfigModel)Matches every other integration config model (
DatadogConfig,GrafanaConfig, etc.).StrictConfigModelsetsextra="forbid"to reject unexpected fields from the store early.P2 — Exception-recovery test
New test
test_get_recent_airflow_failures_partial_run_error_preserves_evidence:run_okreturns a failed task,run_badraisesHTTPStatusError(500)run_okis preserved andrun_badwas actually attemptedNit —
surfaces=("investigation", "chat")on all 3 toolsMatches the convention used by
PostgreSQLTableStatsTool,GitHubCommitsTool,ClickHouseSystemHealthTool, and others.Bonus — Fix 3 pre-existing broken
test_datadog_clienttestssearch_logs()returns{"success": True, "logs": [...], "total": N}— a flat dict. Three tests (test_search_logs_success,test_search_logs_empty_data,test_search_logs_http_error) were written as if it returnsfetch_all()output (nested{logs: {...}, monitors: {...}, events: {...}}). Fixed assertions to match the actual return shape.Test results
Files changed
app/types/evidence.py"airflow"toEvidenceSourceLiteralapp/integrations/airflow.pyBaseModel→StrictConfigModelapp/tools/TracerAirflowDAGTool/__init__.pysource="airflow",surfaces=("investigation","chat")on all 3 toolstests/integrations/test_airflow.pytests/services/test_datadog_client.pysearch_logsassertions