-
Notifications
You must be signed in to change notification settings - Fork 715
feat: add correlation and deduplication with incident management #9011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
|
Failed to generate code suggestions for PR |
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 371 | 345 | 0 | 22 | 4 | 93% | 8m 38s |
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| 2 tests failed | 371 | 344 | 2 | 22 | 3 | 93% | 8m 6s |
Test Failure Analysis
- pipeline-core.spec.js: Failures due to timeout waiting for button click
- Core Pipeline Tests should add source, condition & destination node and then delete the pipeline: Timeout waiting for 'Explore' button click.
- Core Pipeline Tests should add source & destination node and then delete the pipeline: Timeout waiting for 'Explore' button click.
Root Cause Analysis
- The failures are likely related to the recent changes in the UI that may have affected the button's visibility or availability.
Recommended Actions
- Investigate the UI changes in pipeline-core.spec.js to ensure the 'Explore' button is present and interactable. 2. Increase the timeout duration for button clicks in the tests if the UI is slow to respond. 3. Add checks to confirm the button's visibility before attempting to click it.
ce94bd9 to
30d5562
Compare
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| 2 tests failed | 371 | 343 | 2 | 22 | 4 | 92% | 8m 5s |
Test Failure Analysis
- pipeline-core.spec.js: Tests failing due to timeout errors while clicking buttons
- Core Pipeline Tests should add source, condition & destination node and then delete the pipeline: Timeout while waiting for 'Explore' button click.
- Core Pipeline Tests should add source & destination node and then delete the pipeline: Timeout while waiting for 'Explore' button click.
Root Cause Analysis
- The failures are likely related to recent changes in the UI that may have affected element visibility or loading times.
Recommended Actions
- Increase the timeout duration for button clicks in pipeline-core.spec.js. 2. Ensure the 'Explore' button is visible and enabled before the click action in pipeline-core.spec.js. 3. Add explicit wait conditions for the button to be present before attempting to click in pipeline-core.spec.js.
ad3194a to
1a516ad
Compare
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 371 | 345 | 0 | 24 | 2 | 93% | 5m 30s |
Subhra264
approved these changes
Nov 11, 2025
1a516ad to
379ca47
Compare
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| 1 test failed | 242 | 220 | 1 | 20 | 1 | 91% | 6m 56s |
Test Failure Analysis
- logsqueries.spec.js: Timeout issues while interacting with UI elements
- Logs Queries testcases should redirect to logs after clicking on stream explorer via stream page: Timeout waiting for locator '[data-test="logs-search-bar-delete-streamslogpagtxtg5-saved-view-btn"]'.
Root Cause Analysis
- The timeout errors are likely related to recent changes in the logs page interaction logic in logsPage.js.
Recommended Actions
- Investigate the visibility and loading time of the element '[data-test="logs-search-bar-delete-streamslogpagtxtg5-saved-view-btn"]' in logsPage.js. 2. Increase the timeout duration for the click action in the clickDeleteSavedViewButton method. 3. Ensure that the element is present and visible before attempting to click.
oasisk
approved these changes
Nov 12, 2025
379ca47 to
bc6ded0
Compare
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| All tests passed | 371 | 344 | 0 | 24 | 3 | 93% | 5m 32s |
|
| Status | Total | Passed | Failed | Skipped | Flaky | Pass Rate | Duration |
|---|---|---|---|---|---|---|---|
| 1 test failed | 285 | 260 | 1 | 21 | 3 | 91% | 3m 26s |
Test Failure Analysis
- changeOrg.spec.js: Locator issues causing strict mode violations
- Change Organisation Alerts Page default validation: Locator resolved to multiple elements, causing click failure.
Root Cause Analysis
- The recent changes in AlertList.vue introduced new elements that conflict with existing locators.
Recommended Actions
- Update the locator in HomePage.clickDefaultOrg to be more specific to avoid ambiguity.
- Consider using a different selector method that targets the intended element directly.
- Review the changes in AlertList.vue to ensure no unintended interactions with existing dropdowns.
9c7a11a to
2f624cd
Compare
Implements alert correlation that groups related alerts into incidents based
on semantic field matching and temporal proximity, plus deduplication to
prevent alert storms.
Core Capabilities:
- Semantic field groups: Map field variations (hostname/host/node) to canonical
dimensions for consistent matching across different data sources
- Correlation strategies: Match alerts by dimensions (all/any), with temporal
fallback for proximity-based grouping
- Incident lifecycle: Track incidents (open/acknowledged/resolved) with
confidence scoring based on match quality
- Deduplication: Fingerprint-based suppression using alert name, query context,
and semantic dimensions
API Endpoints:
- GET/POST/DELETE `/{org_id}/alerts/correlation/config`
- GET/POST/DELETE `/{org_id}/alerts/deduplication/config`
- GET `/{org_id}/alerts/incidents` (list with status filter)
- GET `/{org_id}/alerts/incidents/{id}` (details with alert list)
- PUT `/{org_id}/alerts/incidents/{id}/status`
Database Schema:
- `alert_incidents`: Incident records with correlation metadata
- `alert_incident_alerts`: Many-to-many mapping of alerts to incidents
- `alert_dedup_state`: Deduplication fingerprint tracking
Implementation:
- Business logic: Pure algorithms for matching, classification, fingerprinting
- Service layer: Orchestrates DB operations with algorithm delegation
- HTTP handlers: Feature-gated dual implementations for OSS/enterprise builds
- Config types: Shared data structures with validation in `config` crate
…eduplication This commit addresses CI failures from GitHub Actions runs: - Fixed 5 clippy uninlined_format_args warnings in Rust code - Added 140+ comprehensive unit tests for new Vue components Backend fixes: - correlation.rs: Inline format string variables (2 fixes) - deduplication.rs: Inline format string variables (2 fixes) - correlation.rs (service): Inline format string variable (1 fix) Frontend tests added (71.96% coverage, up from 71.54%): - TagInput.spec.ts: 19 tests for tag input component - SemanticGroupItem.spec.ts: 19 tests for semantic group editing - IncidentList.spec.ts: 30 tests for incident list display - DeduplicationConfig.spec.ts: 8 tests for dedup configuration - OrganizationDeduplicationSettings.spec.ts: 8 tests for org settings - SemanticFieldGroupsConfig.spec.ts: 61 tests for field group management The new tests provide solid coverage of core functionality including: - Component rendering and structure - User interactions (clicks, inputs, form submissions) - Data validation and formatting - State management and prop updates - Preset loading and configuration - Edge cases and error handling
Prevents TypeError when viewing SQL-based alerts where conditions field is null instead of an object with length property.
Separates deduplication configuration into two distinct levels:
- Organization-level: Semantic field groups + default time window (global)
- Per-alert level: Fingerprint fields + time window override (per-alert)
Backend Changes:
- Add OrganizationDeduplicationConfig for org-wide semantic groups
- Keep DeduplicationConfig (from main) for per-alert fingerprint fields
- Update API handlers to use OrganizationDeduplicationConfig
- Update service layer for proper type separation
- All type references updated across codebase
Frontend Changes:
- OrganizationDeduplicationSettings.vue: Remove fingerprint fields UI
- SemanticFieldGroupsConfig.vue: Add showFingerprintFields prop
- DeduplicationConfig.vue: Unchanged (per-alert, matches main branch)
- Updated descriptions to clarify config separation
Testing:
- Add test_dedup_correlation.sh for API integration testing
- Add TESTING_DEDUP_CORRELATION.md comprehensive test guide
- Cargo check passes ✅
How it works:
1. Org-level defines semantic groups: {"host": ["host", "hostname", "node"]}
2. Per-alert specifies actual fields: ["hostname", "service_name"]
3. Enterprise module uses semantic groups for reverse lookup/mapping
Implements cross-alert deduplication that suppresses alerts from different
alert rules when they share semantic dimensions with recently fired alerts.
Key Changes:
- Add cross_alert_dedup flag to OrganizationDeduplicationConfig
- Add semantic dimension extraction helpers to org config
- Update deduplication service to fetch and pass org config to enterprise
- Add find_matching_semantic_fingerprints() for cross-alert lookups
Behavior:
- cross_alert_dedup=false (default): Per-alert dedup only (backward compatible)
* Alert A: fingerprint="alert_A:srv01:api"
* Alert B: fingerprint="alert_B:srv01:region" → Both sent
- cross_alert_dedup=true (new): Cross-alert semantic dedup
* Alert A fires: semantic_dims={host:srv01, service:api}
* Alert B fires 30s later: semantic_dims={host:srv01, region:us-east}
* Result: Alert B suppressed (shares host=srv01 dimension)
Enterprise Module Requirements:
- Updated calculate_fingerprint() signature with org_config parameter
- Semantic fingerprint format: "dim1=val1,dim2=val2" (no alert ID)
- fingerprint_matches_dimensions() for overlap detection
- See CROSS_ALERT_DEDUP_SPEC.md for full specification
Benefits:
- Prevents alert storms from related issues across different monitors
- Semantic grouping allows flexible field name variations
- Opt-in feature with backward compatibility
Fixes tests broken by the separation of org-level and per-alert configs. Changes: - Rename test_deduplication_config_validation → test_organization_deduplication_config_validation - Add test_per_alert_deduplication_config_validation for per-alert config - Split test_deduplication_config_serialization into org and per-alert versions - Update test_deduplication_config_default to test both config types separately - All tests now use correct config types (OrganizationDeduplicationConfig vs DeduplicationConfig) Test results: - 11 tests in config crate: ✅ all passing - Tests properly validate both config levels independently
Fixes correlation service to fetch semantic groups from org-level deduplication config instead of per-alert config. Changes: - Update scheduler/handlers.rs to fetch semantic groups from org config - Remove duplicate claim_parser_function block (compilation error) - Semantic groups now correctly sourced from OrganizationDeduplicationConfig Behavior: - Correlation uses org-wide semantic field groups - Consistent with deduplication's use of org-level groups - Per-alert config no longer has semantic_field_groups field
2f624cd to
eda9f6e
Compare
ByteBaker
added a commit
that referenced
this pull request
Nov 25, 2025
Implements alert correlation that groups related alerts into incidents based on semantic field matching and temporal proximity. **Backend:** - Add `correlation.rs` config with validation for correlation dimensions and matching strategies - Add `alert_incidents` and `alert_incident_alerts` database entities with SeaORM - Add database migration `m20251107_000003_create_alert_correlation_schema` - Add `correlation.rs` service with transaction-safe incident creation and matching - Add `incidents.rs` HTTP handlers for incident CRUD operations (6 endpoints) - Integrate correlation into alert scheduler to auto-correlate on alert firing - Add 7 correlation metrics for observability (incidents created, alerts matched, confidence distribution, processing duration, MTTR) - Update `org_config.rs` with correlation config persistence functions - Update organization settings to include deduplication config in response **Frontend:** - Add `IncidentList.vue` component with status filtering and sortable table - Add `IncidentDetailsDrawer.vue` for viewing incident details and associated alerts - Add Incidents tab to `AlertList.vue` for accessing incident management UI - Add 6 incident API methods to `alerts.ts` service (list, get, update status, config CRUD) **Fixes:** - Fix `OrganizationSettingResponse` test to include `deduplication_config` field - Fix metering init call signature (remove extra argument) - Comment out data retention usage code pending enterprise module update Migrated from PR #9011, separated from deduplication feature (PR #9209).
ByteBaker
added a commit
that referenced
this pull request
Nov 25, 2025
Implements alert correlation that groups related alerts into incidents based on semantic field matching and temporal proximity. **Backend:** - Add `correlation.rs` config with validation for correlation dimensions and matching strategies - Add `alert_incidents` and `alert_incident_alerts` database entities with SeaORM - Add database migration `m20251107_000003_create_alert_correlation_schema` - Add `correlation.rs` service with transaction-safe incident creation and matching - Add `incidents.rs` HTTP handlers for incident CRUD operations (6 endpoints) - Integrate correlation into alert scheduler to auto-correlate on alert firing - Add 7 correlation metrics for observability (incidents created, alerts matched, confidence distribution, processing duration, MTTR) - Update `org_config.rs` with correlation config persistence functions - Update organization settings to include deduplication config in response **Frontend:** - Add `IncidentList.vue` component with status filtering and sortable table - Add `IncidentDetailsDrawer.vue` for viewing incident details and associated alerts - Add Incidents tab to `AlertList.vue` for accessing incident management UI - Add 6 incident API methods to `alerts.ts` service (list, get, update status, config CRUD) **Fixes:** - Fix `OrganizationSettingResponse` test to include `deduplication_config` field - Fix metering init call signature (remove extra argument) - Comment out data retention usage code pending enterprise module update Migrated from PR #9011, separated from deduplication feature (PR #9209).
Contributor
Author
|
Closing in favour of separate dev track. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements alert correlation that groups related alerts into incidents based on semantic field matching and temporal proximity, plus deduplication to prevent alert storms.
Core Capabilities:
API Endpoints:
/{org_id}/alerts/correlation/config/{org_id}/alerts/deduplication/config/{org_id}/alerts/incidents(list with status filter)/{org_id}/alerts/incidents/{id}(details with alert list)/{org_id}/alerts/incidents/{id}/statusDatabase Schema:
alert_incidents: Incident records with correlation metadataalert_incident_alerts: Many-to-many mapping of alerts to incidentsalert_dedup_state: Deduplication fingerprint trackingImplementation:
configcrate