-
Notifications
You must be signed in to change notification settings - Fork 715
feat: add organization-level alert deduplication with semantic field groups #9209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
|
Failed to generate code suggestions for PR |
b8a5ee6 to
c83de04
Compare
c83de04 to
8ff259a
Compare
ByteBaker
added a commit
that referenced
this pull request
Nov 25, 2025
Implements alert correlation that groups related alerts into incidents based on semantic field matching and temporal proximity. **Backend:** - Add `correlation.rs` config with validation for correlation dimensions and matching strategies - Add `alert_incidents` and `alert_incident_alerts` database entities with SeaORM - Add database migration `m20251107_000003_create_alert_correlation_schema` - Add `correlation.rs` service with transaction-safe incident creation and matching - Add `incidents.rs` HTTP handlers for incident CRUD operations (6 endpoints) - Integrate correlation into alert scheduler to auto-correlate on alert firing - Add 7 correlation metrics for observability (incidents created, alerts matched, confidence distribution, processing duration, MTTR) - Update `org_config.rs` with correlation config persistence functions - Update organization settings to include deduplication config in response **Frontend:** - Add `IncidentList.vue` component with status filtering and sortable table - Add `IncidentDetailsDrawer.vue` for viewing incident details and associated alerts - Add Incidents tab to `AlertList.vue` for accessing incident management UI - Add 6 incident API methods to `alerts.ts` service (list, get, update status, config CRUD) **Fixes:** - Fix `OrganizationSettingResponse` test to include `deduplication_config` field - Fix metering init call signature (remove extra argument) - Comment out data retention usage code pending enterprise module update Migrated from PR #9011, separated from deduplication feature (PR #9209).
8ff259a to
1656980
Compare
ByteBaker
added a commit
that referenced
this pull request
Nov 25, 2025
Implements alert correlation that groups related alerts into incidents based on semantic field matching and temporal proximity. **Backend:** - Add `correlation.rs` config with validation for correlation dimensions and matching strategies - Add `alert_incidents` and `alert_incident_alerts` database entities with SeaORM - Add database migration `m20251107_000003_create_alert_correlation_schema` - Add `correlation.rs` service with transaction-safe incident creation and matching - Add `incidents.rs` HTTP handlers for incident CRUD operations (6 endpoints) - Integrate correlation into alert scheduler to auto-correlate on alert firing - Add 7 correlation metrics for observability (incidents created, alerts matched, confidence distribution, processing duration, MTTR) - Update `org_config.rs` with correlation config persistence functions - Update organization settings to include deduplication config in response **Frontend:** - Add `IncidentList.vue` component with status filtering and sortable table - Add `IncidentDetailsDrawer.vue` for viewing incident details and associated alerts - Add Incidents tab to `AlertList.vue` for accessing incident management UI - Add 6 incident API methods to `alerts.ts` service (list, get, update status, config CRUD) **Fixes:** - Fix `OrganizationSettingResponse` test to include `deduplication_config` field - Fix metering init call signature (remove extra argument) - Comment out data retention usage code pending enterprise module update Migrated from PR #9011, separated from deduplication feature (PR #9209).
oasisk
approved these changes
Nov 25, 2025
f187518 to
42824bf
Compare
4058140 to
89224cb
Compare
…groups
Implements org-level deduplication configuration with semantic field groups
for intelligent alert suppression and batched notifications.
Core capabilities:
- Semantic field groups: Map field name variations (`hostname`/`host`/`node`) to
canonical dimensions for consistent deduplication across data sources
- Org-level dedup config: Global settings with cross-alert deduplication
support to suppress alerts sharing semantic dimensions
- Alert grouping: Wait-and-collect batching with three send strategies
(`FirstWithCount`, `Summary`, `All`)
- HTTP API: Endpoints for org-level deduplication configuration management
Features:
- Per-alert fingerprint-based deduplication with TTL expiration
- Cross-alert semantic matching using org-defined dimension groups
- Configurable time windows (per-alert override or org default)
- Background job processes expired batches every 1 second
- Three notification strategies for grouped alerts
Configuration:
- `SemanticFieldGroup`: Define field equivalences with optional normalization
- `GlobalDeduplicationConfig`: Org-level settings stored at `/alert_config/{org_id}/deduplication`
- `DeduplicationConfig`: Per-alert settings with fingerprint fields and grouping options
- Default presets for common semantic groups (host, IP, service, K8s resources)
API Endpoints:
- `GET/POST/DELETE /{org_id}/alerts/deduplication/config`
Implementation:
- Business logic: Pure algorithms in enterprise layer for fingerprinting and matching
- Service layer: Orchestrates DB operations with algorithm delegation
- HTTP handlers: Feature-gated dual implementations for OSS/enterprise builds
- Background jobs: Batch processor for grouped notification delivery
Implemented wait-and-batch mechanism to group multiple alerts with the same fingerprint before sending a single notification, reducing alert fatigue and improving visibility. **Alert Grouping/Batching:** - Added `grouping.rs` module with in-memory batch storage using `DashMap` - Implemented background worker in `alert_grouping.rs` polling every 1s for expired batches - Integrated grouping logic in `scheduler/handlers.rs` after deduplication - Supported all three `SendStrategy` variants: `FirstWithCount`, `Summary`, `All` - Auto-send when `max_group_size` reached or timer expires after `group_wait_seconds` - Registered background worker in `job/mod.rs` **Observability - Prometheus Metrics:** - Added 8 metrics to `metrics.rs`: dedup suppressions/passed/errors, grouping batches pending/sent/size/wait-time/errors - Instrumented `deduplication.rs` to track suppressions and passed alerts by type (same-alert vs cross-alert) - Instrumented `grouping.rs` and `alert_grouping.rs` to track batch lifecycle - All metrics registered in Prometheus registry and exposed at `/metrics` endpoint **UI Visibility:** - Added dedup badges to alert names in `AlertList.vue` showing configuration status - Added dedup column in `AlertHistory.vue` with visual indicators for sent/suppressed/grouped alerts - Created `DedupSummaryCards.vue` component displaying org-wide stats (total alerts, dedup enabled count, suppression rate, pending batches) - Added backend API `dedup_stats.rs` with `/alerts/dedup/summary` endpoint - Removed legacy View History button from alert list page - Extended `AlertHistoryEntry` with dedup fields: `dedup_enabled`, `dedup_suppressed`, `dedup_count`, `grouped`, `group_size` **Logging & Debugging:** - Comprehensive logging throughout grouping flow with `[grouping]` and `[alert_grouping_worker]` prefixes - Enhanced deduplication logging with `[dedup]` prefix showing fingerprints and occurrence counts - Added `get_pending_batch_count()` helper for API consumption **Technical Details:** - All features properly gated behind `#[cfg(feature = "enterprise")]` - Backward compatible: grouping disabled by default - In-memory batches cleared on restart (acceptable for 30s window) - Thread-safe implementation using `DashMap` and atomic operations
Extended TriggerData with dedup fields for per-execution visibility and added compact dedup column to alert table. Alert History Tracking: - Added dedup fields to TriggerData: dedup_enabled, dedup_suppressed, dedup_count, grouped, group_size - Implemented Default trait for clean initialization with ..Default::default() - Set tracking fields in handlers.rs when alerts are suppressed, grouped, or sent - History UI now shows actual dedup activity with icons Alert List UI: - Added compact Dedup column in alert table (80px width) - Shows check icon if dedup enabled, dash if not - Tooltip displays fingerprint fields and grouping config - Clean inline status per alert without UI clutter
Fixed four failing unit tests: 1. test_organization_config_minimal_serialization - Updated to check correct field name 'alert_dedup_enabled' instead of 'cross_alert_dedup' 2. test_flatten_json_complex - Made JSON comparison order-independent by parsing embedded JSON strings 3. test_flatten_with_level - Added helper function to structurally compare JSON values regardless of key ordering 4. test_trigger_data_field_names - Updated field count from 21 to 26 to reflect new dedup/grouping fields The tests were failing due to non-deterministic JSON key ordering and outdated field counts.
…groups
Implements org-level deduplication configuration with semantic field groups
for intelligent alert suppression and batched notifications.
Core capabilities:
- Semantic field groups: Map field name variations (`hostname`/`host`/`node`) to
canonical dimensions for consistent deduplication across data sources
- Org-level dedup config: Global settings with cross-alert deduplication
support to suppress alerts sharing semantic dimensions
- Alert grouping: Wait-and-collect batching with three send strategies
(`FirstWithCount`, `Summary`, `All`)
- HTTP API: Endpoints for org-level deduplication configuration management
Features:
- Per-alert fingerprint-based deduplication with TTL expiration
- Cross-alert semantic matching using org-defined dimension groups
- Configurable time windows (per-alert override or org default)
- Background job processes expired batches every 1 second
- Three notification strategies for grouped alerts
Configuration:
- `SemanticFieldGroup`: Define field equivalences with optional normalization
- `GlobalDeduplicationConfig`: Org-level settings stored at `/alert_config/{org_id}/deduplication`
- `DeduplicationConfig`: Per-alert settings with fingerprint fields and grouping options
- Default presets for common semantic groups (host, IP, service, K8s resources)
API Endpoints:
- `GET/POST/DELETE /{org_id}/alerts/deduplication/config`
Implementation:
- Business logic: Pure algorithms in enterprise layer for fingerprinting and matching
- Service layer: Orchestrates DB operations with algorithm delegation
- HTTP handlers: Feature-gated dual implementations for OSS/enterprise builds
- Background jobs: Batch processor for grouped notification delivery
Implemented wait-and-batch mechanism to group multiple alerts with the same fingerprint before sending a single notification, reducing alert fatigue and improving visibility. **Alert Grouping/Batching:** - Added `grouping.rs` module with in-memory batch storage using `DashMap` - Implemented background worker in `alert_grouping.rs` polling every 1s for expired batches - Integrated grouping logic in `scheduler/handlers.rs` after deduplication - Supported all three `SendStrategy` variants: `FirstWithCount`, `Summary`, `All` - Auto-send when `max_group_size` reached or timer expires after `group_wait_seconds` - Registered background worker in `job/mod.rs` **Observability - Prometheus Metrics:** - Added 8 metrics to `metrics.rs`: dedup suppressions/passed/errors, grouping batches pending/sent/size/wait-time/errors - Instrumented `deduplication.rs` to track suppressions and passed alerts by type (same-alert vs cross-alert) - Instrumented `grouping.rs` and `alert_grouping.rs` to track batch lifecycle - All metrics registered in Prometheus registry and exposed at `/metrics` endpoint **UI Visibility:** - Added dedup badges to alert names in `AlertList.vue` showing configuration status - Added dedup column in `AlertHistory.vue` with visual indicators for sent/suppressed/grouped alerts - Created `DedupSummaryCards.vue` component displaying org-wide stats (total alerts, dedup enabled count, suppression rate, pending batches) - Added backend API `dedup_stats.rs` with `/alerts/dedup/summary` endpoint - Removed legacy View History button from alert list page - Extended `AlertHistoryEntry` with dedup fields: `dedup_enabled`, `dedup_suppressed`, `dedup_count`, `grouped`, `group_size` **Logging & Debugging:** - Comprehensive logging throughout grouping flow with `[grouping]` and `[alert_grouping_worker]` prefixes - Enhanced deduplication logging with `[dedup]` prefix showing fingerprints and occurrence counts - Added `get_pending_batch_count()` helper for API consumption **Technical Details:** - All features properly gated behind `#[cfg(feature = "enterprise")]` - Backward compatible: grouping disabled by default - In-memory batches cleared on restart (acceptable for 30s window) - Thread-safe implementation using `DashMap` and atomic operations
Extended TriggerData with dedup fields for per-execution visibility and added compact dedup column to alert table. Alert History Tracking: - Added dedup fields to TriggerData: dedup_enabled, dedup_suppressed, dedup_count, grouped, group_size - Implemented Default trait for clean initialization with ..Default::default() - Set tracking fields in handlers.rs when alerts are suppressed, grouped, or sent - History UI now shows actual dedup activity with icons Alert List UI: - Added compact Dedup column in alert table (80px width) - Shows check icon if dedup enabled, dash if not - Tooltip displays fingerprint fields and grouping config - Clean inline status per alert without UI clutter
a649335 to
46cf892
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements org-level alert deduplication with semantic field groups for intelligent alert suppression and batched notifications. This feature prevents alert storms by deduplicating alerts based on configurable fingerprints and grouping related alerts together.
Core Capabilities
hostname,host,node) to canonical dimensions for consistent deduplication across different data sourcesFirstWithCount,Summary,All) for batched notificationsFeatures
Deduplication Modes:
Alert Grouping:
Semantic Field Groups:
Configuration Types
SemanticFieldGroup: Field name equivalence definitions with normalizationGlobalDeduplicationConfig: Org-level settings with semantic groups and cross-alert dedupDeduplicationConfig: Per-alert fingerprint fields and grouping optionsAPI Endpoints
GET /{org_id}/alerts/deduplication/config- Retrieve org-level dedup configPOST /{org_id}/alerts/deduplication/config- Set org-level dedup configDELETE /{org_id}/alerts/deduplication/config- Delete org-level dedup configImplementation Details
#[cfg(feature = "enterprise")]for enterprise-only functionality/alert_config/{org_id}/deduplicationArchitecture