-
Notifications
You must be signed in to change notification settings - Fork 329
refactor(snmp_zabbix): migrate scheduler to Min-Heap with consistent … #1379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the SNMP Zabbix input plugin's scheduler from an interval-based ticker system to a Min-Heap based event-loop architecture with consistent hashing for better task distribution and resource efficiency.
Key Changes:
- Replaces multiple interval-based goroutines with a single event-loop processing a priority queue
- Implements consistent hashing to scatter initial task execution times across intervals, preventing thundering herd issues
- Adds cycle detection and improved error handling in dependency tree construction
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
inputs/snmp_zabbix/priority_queue.go |
New file implementing heap.Interface for ScheduledTask priority queue with NextRun as the priority key |
inputs/snmp_zabbix/utils.go |
Extracted jitter calculation utilities from scheduler.go into a separate file |
inputs/snmp_zabbix/scheduler.go |
Major refactoring: replaced interval-based scheduling with heap-based event loop, added consistent hashing, simplified task grouping by (Agent, Interval) |
inputs/snmp_zabbix/scheduler_test.go |
New test file covering consistent hashing determinism, heap ordering, and differential LLD update logic |
inputs/snmp_zabbix/snmp.go |
Enhanced buildTree function with cycle detection and nil pointer checks for dependency resolution |
inputs/snmp_zabbix/template.go |
Removed Chinese comments and replaced with English documentation, no functional changes |
inputs/snmp_zabbix/template_macro_test.go |
New test file for macro expansion robustness with various key formats |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
inputs/snmp_zabbix/scheduler.go
Outdated
| // Hash is deterministic. | ||
| // Offset is in [0, Interval) | ||
| // We cast to int64 (Duration) which is safe as interval is usually < 290 years | ||
| offset := time.Duration(h.Sum64() % uint64(interval)) | ||
|
|
||
| now := time.Now() | ||
| // ... |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete comment. The comment on lines 215-216 ends with an ellipsis ("...") suggesting unfinished documentation. Complete this comment to explain the full logic, particularly around how the offset is calculated and why returning now.Add(offset) achieves the desired scattered scheduling.
| // Hash is deterministic. | |
| // Offset is in [0, Interval) | |
| // We cast to int64 (Duration) which is safe as interval is usually < 290 years | |
| offset := time.Duration(h.Sum64() % uint64(interval)) | |
| now := time.Now() | |
| // ... | |
| // Hash is deterministic for a given agent string, so the same agent will always | |
| // produce the same offset value. | |
| // The modulus with interval keeps the offset in [0, interval), effectively placing | |
| // the agent at a stable position within the interval window. | |
| // We cast to time.Duration (backed by int64 nanoseconds), which is safe as | |
| // interval is expected to be well within the valid duration range. | |
| offset := time.Duration(h.Sum64() % uint64(interval)) | |
| now := time.Now() | |
| // By adding the deterministic offset to "now", each agent's first execution is | |
| // delayed by a consistent amount within the interval, scattering start times | |
| // across agents and avoiding a thundering herd while still being reproducible | |
| // across restarts for the same agent and interval. |
|
|
||
| // 2. Scattering check | ||
| // Generate 100 agents, check if their offsets are distributed | ||
| // agents := make([]string, 100) (unused) |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test comment references "unused" for the agents variable on line 35 but this comment is misleading - the variable was never declared in the first place. This appears to be leftover from refactoring. Consider removing this comment entirely as it adds no value and creates confusion.
| // agents := make([]string, 100) (unused) |
inputs/snmp_zabbix/scheduler.go
Outdated
| k := s.taskKey(newItem.Agent, newItem.Delay) | ||
| s.addItemToTask(k, newItem) |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar redundancy on line 318: k := s.taskKey(newItem.Agent, newItem.Delay) recalculates the same value as targetKey from line 279. Use the existing targetKey variable instead.
| // But we should use >> 1 unique slots. | ||
| if len(uniqueOffsets) < 10 { | ||
| t.Errorf("Expected distributed offsets, but got only %d unique seconds slots out of 100 agents", len(uniqueOffsets)) |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number without explanation. The test expects at least 10 unique second slots out of 100 agents (line 60), but this threshold appears arbitrary. With consistent hashing over a 60-second interval and 100 agents, the expected distribution should yield approximately 60 unique slots (with some collisions due to birthday paradox). A threshold of 10 is too lenient and might not catch real distribution problems. Consider using a more meaningful threshold (e.g., at least 40 unique slots) or documenting why 10 is appropriate.
| // But we should use >> 1 unique slots. | |
| if len(uniqueOffsets) < 10 { | |
| t.Errorf("Expected distributed offsets, but got only %d unique seconds slots out of 100 agents", len(uniqueOffsets)) | |
| // We conservatively require a majority of slots to be used to catch bad distributions without being too strict. | |
| const minUniqueOffsetSlots = 40 | |
| if len(uniqueOffsets) < minUniqueOffsetSlots { | |
| t.Errorf("Expected distributed offsets, but got only %d unique seconds slots out of 100 agents (minimum expected: %d)", len(uniqueOffsets), minUniqueOffsetSlots) |
|
|
||
| now := time.Now() | ||
|
|
||
| // Add 3 tasks with different delays (which result in diff NextRun) | ||
| // We manually inject for test | ||
| t1 := &ScheduledTask{Agent: "A", NextRun: now.Add(10 * time.Second)} | ||
| t2 := &ScheduledTask{Agent: "B", NextRun: now.Add(5 * time.Second)} | ||
| t3 := &ScheduledTask{Agent: "C", NextRun: now.Add(15 * time.Second)} | ||
|
|
||
| // Use heap.Push to maintain invariant | ||
| heap.Push(&s.pq, t1) | ||
| heap.Push(&s.pq, t2) | ||
| heap.Push(&s.pq, t3) | ||
|
|
||
| // Pop order should be: B (5s), A (10s), C (15s) | ||
|
|
||
| p1 := heap.Pop(&s.pq).(*ScheduledTask) | ||
| if p1.Agent != "B" { | ||
| t.Errorf("Expected first pop to be B, got %s", p1.Agent) | ||
| } | ||
|
|
||
| p2 := heap.Pop(&s.pq).(*ScheduledTask) | ||
| if p2.Agent != "A" { | ||
| t.Errorf("Expected second pop to be A, got %s", p2.Agent) | ||
| } | ||
|
|
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential data race in the test. The test accesses s.pq directly (lines 77-79) without acquiring the scheduler's mutex. While this is a test environment, if the scheduler design expects concurrent access protection, the test should either respect that pattern or document why it's safe to skip locking here. This could also serve as documentation for whether the heap operations themselves are thread-safe or require external synchronization.
| now := time.Now() | |
| // Add 3 tasks with different delays (which result in diff NextRun) | |
| // We manually inject for test | |
| t1 := &ScheduledTask{Agent: "A", NextRun: now.Add(10 * time.Second)} | |
| t2 := &ScheduledTask{Agent: "B", NextRun: now.Add(5 * time.Second)} | |
| t3 := &ScheduledTask{Agent: "C", NextRun: now.Add(15 * time.Second)} | |
| // Use heap.Push to maintain invariant | |
| heap.Push(&s.pq, t1) | |
| heap.Push(&s.pq, t2) | |
| heap.Push(&s.pq, t3) | |
| // Pop order should be: B (5s), A (10s), C (15s) | |
| p1 := heap.Pop(&s.pq).(*ScheduledTask) | |
| if p1.Agent != "B" { | |
| t.Errorf("Expected first pop to be B, got %s", p1.Agent) | |
| } | |
| p2 := heap.Pop(&s.pq).(*ScheduledTask) | |
| if p2.Agent != "A" { | |
| t.Errorf("Expected second pop to be A, got %s", p2.Agent) | |
| } | |
| now := time.Now() | |
| // Add 3 tasks with different delays (which result in diff NextRun) | |
| // We manually inject for test | |
| t1 := &ScheduledTask{Agent: "A", NextRun: now.Add(10 * time.Second)} | |
| t2 := &ScheduledTask{Agent: "B", NextRun: now.Add(5 * time.Second)} | |
| t3 := &ScheduledTask{Agent: "C", NextRun: now.Add(15 * time.Second)} | |
| // All direct access to s.pq is done under the scheduler's mutex to | |
| // respect the same concurrency guarantees as production code. | |
| s.mu.Lock() | |
| defer s.mu.Unlock() | |
| // Use heap.Push to maintain invariant | |
| heap.Push(&s.pq, t1) | |
| heap.Push(&s.pq, t2) | |
| heap.Push(&s.pq, t3) | |
| // Pop order should be: B (5s), A (10s), C (15s) | |
| p1 := heap.Pop(&s.pq).(*ScheduledTask) | |
| if p1.Agent != "B" { | |
| t.Errorf("Expected first pop to be B, got %s", p1.Agent) | |
| } | |
| p2 := heap.Pop(&s.pq).(*ScheduledTask) | |
| if p2.Agent != "A" { | |
| t.Errorf("Expected second pop to be A, got %s", p2.Agent) | |
| } |
inputs/snmp_zabbix/scheduler.go
Outdated
| } | ||
|
|
||
| func NewItemScheduler(collector *SNMPCollector, labelCache *LabelCache) *ItemScheduler { | ||
| rand.Seed(time.Now().UnixNano()) |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rand.Seed() call on line 52 uses the deprecated math/rand global seed pattern. In Go 1.20+, the global random generator is automatically seeded. If supporting older Go versions, this is fine, but if the minimum version is 1.20 or higher, this line can be removed. Additionally, using math/rand in the jitter function (utils.go line 32) without seeding per-goroutine could lead to contention on the global lock. Consider using rand.New(rand.NewSource(time.Now().UnixNano())) for per-instance random generators if concurrent performance matters.
| rand.Seed(time.Now().UnixNano()) |
inputs/snmp_zabbix/scheduler.go
Outdated
| // Actually it was local in this file. | ||
| func copyTags(src map[string]string) map[string]string { | ||
| dst := make(map[string]string, len(src)+1) | ||
| dst := make(map[string]string, len(src)) |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copyTags function allocates a map with len(src) capacity, but then comment on line 482-483 suggests it might need to accommodate an additional key. If tags are frequently copied and then modified (as seen in lines 467-469 and 474-476 where action is added), consider using len(src)+1 for the capacity to avoid potential reallocation.
| dst := make(map[string]string, len(src)) | |
| dst := make(map[string]string, len(src)+1) |
inputs/snmp_zabbix/scheduler.go
Outdated
|
|
||
| now := time.Now() | ||
| // ... | ||
| return now.Add(offset) |
Copilot
AI
Dec 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consistent hashing implementation has a logic flaw. On line 217, the function returns now.Add(offset), which means the next run time is always in the future by offset amount. However, for proper scheduling, when a task group is first created, it should check if adding the offset results in a time still in the past (e.g., if offset is 5 seconds but initialization happens at second 50 of a minute with 60s interval). The scheduler should advance to the next interval cycle if needed. This could cause tasks to run immediately instead of at their intended scattered time slots.
| now := time.Now() | |
| // ... | |
| return now.Add(offset) | |
| now := time.Now() | |
| // Align to the start of the current interval, then add the offset. | |
| // If that time is already in the past, advance to the next interval. | |
| base := now.Truncate(interval) | |
| nextRun := base.Add(offset) | |
| if nextRun.Before(now) { | |
| nextRun = nextRun.Add(interval) | |
| } | |
| return nextRun |
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
inputs/snmp_zabbix/scheduler.go
Outdated
| } | ||
|
|
||
| func NewItemScheduler(collector *SNMPCollector, labelCache *LabelCache) *ItemScheduler { | ||
| rand.Seed(time.Now().UnixNano()) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rand.Seed() call is deprecated in Go 1.20+ and no longer necessary. The global random number generator is automatically seeded. Additionally, using math/rand without proper synchronization can lead to race conditions when accessed concurrently from multiple goroutines.
inputs/snmp_zabbix/scheduler.go
Outdated
|
|
||
| // Catchup logic: if mostly lagging, don't run immediately multiple times, just skip intervals? | ||
| // For SNMP, skipping intervals is better than burst. | ||
| // If NextRun is still in past after adding interval, add more until it is in future or close to now. | ||
| if task.NextRun.Before(now) { | ||
| // Calculate how many intervals we missed | ||
| // (Now - NextRun) / Interval | ||
| diff := now.Sub(task.NextRun) | ||
| cycles := diff / task.Interval | ||
| if cycles > 0 { | ||
| task.NextRun = task.NextRun.Add(cycles * task.Interval) | ||
| } | ||
| // One more check | ||
| if task.NextRun.Before(now) { | ||
| task.NextRun = task.NextRun.Add(task.Interval) | ||
| } |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heap.Fix() call is used after manually updating task.NextRun, but the task was not popped from the heap first. This is the correct usage of heap.Fix() when updating an element in place. However, there's a subtle issue: if the scheduler is significantly lagging and multiple intervals are skipped (lines 152-164), the catchup logic might not work correctly in all edge cases. Specifically, if diff / task.Interval calculation results in a very large number of cycles, adding cycles * task.Interval could overflow or result in incorrect scheduling.
| // Catchup logic: if mostly lagging, don't run immediately multiple times, just skip intervals? | |
| // For SNMP, skipping intervals is better than burst. | |
| // If NextRun is still in past after adding interval, add more until it is in future or close to now. | |
| if task.NextRun.Before(now) { | |
| // Calculate how many intervals we missed | |
| // (Now - NextRun) / Interval | |
| diff := now.Sub(task.NextRun) | |
| cycles := diff / task.Interval | |
| if cycles > 0 { | |
| task.NextRun = task.NextRun.Add(cycles * task.Interval) | |
| } | |
| // One more check | |
| if task.NextRun.Before(now) { | |
| task.NextRun = task.NextRun.Add(task.Interval) | |
| } | |
| // Catchup logic: if heavily lagging, skip missed intervals rather than executing in a burst. | |
| // Use a bounded loop to avoid overflowing time.Duration when many intervals were missed. | |
| if task.Interval > 0 { | |
| const maxCatchupCycles = 1000 | |
| // If NextRun is still in the past after adding one interval, advance in steps. | |
| for i := 0; i < maxCatchupCycles && task.NextRun.Before(now); i++ { | |
| task.NextRun = task.NextRun.Add(task.Interval) | |
| } | |
| // If we're still behind after maxCatchupCycles, schedule relative to "now" | |
| // to avoid extreme catchup loops and potential overflow. | |
| if task.NextRun.Before(now) { | |
| task.NextRun = now.Add(task.Interval) | |
| } | |
| } else { | |
| // Non-positive interval: fall back to scheduling at current time. | |
| task.NextRun = now |
inputs/snmp_zabbix/scheduler.go
Outdated
| offset := time.Duration(h.Sum64() % uint64(interval)) | ||
|
|
||
| now := time.Now() | ||
| // ... |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment indicates that the ellipsis "..." at line 216 suggests incomplete implementation. This appears to be leftover debug or placeholder code that should either be completed or removed.
| // ... |
| type MockCollector struct {} | ||
| func (m *MockCollector) CollectItems(ctx context.Context, items []MonitorItem, slist *types.SampleList) error { | ||
| return nil | ||
| } |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MockCollector struct and its method are defined at the package level but are only used in tests. This is unconventional - typically test helpers should be defined within the test file or in a separate _test.go file. While Go allows this, it exports test infrastructure unnecessarily. Consider moving this to a test helper function or using an interface type assertion within the test instead.
| // Let's verify the OFFSET is consistent. (We can't easily access private method output directly without exposing or copy-paste logic). | ||
| // But we can check stability. | ||
|
|
||
| if t1.Sub(t2).Abs() > time.Second { |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test uses t1.Sub(t2).Abs() to check time difference, but time.Duration doesn't have an Abs() method in standard Go (it was added in Go 1.19+). If this code needs to support Go versions before 1.19, this will cause a compilation error. Consider using a manual absolute value check or documenting the minimum Go version requirement.
| if t1.Sub(t2).Abs() > time.Second { | |
| diff := t1.Sub(t2) | |
| if diff < 0 { | |
| diff = -diff | |
| } | |
| if diff > time.Second { |
| if s.pq.Len() == 0 { | ||
| s.mu.Unlock() | ||
| time.Sleep(idleWait) | ||
| continue |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The busy-wait loop with time.Sleep(idleWait) when the queue is empty is inefficient. Consider using a condition variable or channel-based signaling to wake up the goroutine when new tasks are added, rather than polling every second. This would reduce CPU usage and improve responsiveness when new tasks are added.
| if len(uniqueOffsets) < 10 { | ||
| t.Errorf("Expected distributed offsets, but got only %d unique seconds slots out of 100 agents", len(uniqueOffsets)) | ||
| } |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test assertions for task distribution are weak. The test only checks if at least 10 unique second-slots exist out of 100 agents with a 60-second interval, which is a very low bar. With proper hashing, you should expect much better distribution. Consider strengthening the assertion or adding statistical tests for distribution quality (e.g., checking if the distribution is reasonably uniform).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if task.NextRun.Before(now) { | ||
| diff := now.Sub(task.NextRun) | ||
| cycles := diff / task.Interval | ||
| if cycles > 0 { | ||
| task.NextRun = task.NextRun.Add(cycles * task.Interval) | ||
| } | ||
| if task.NextRun.Before(now) { | ||
| task.NextRun = task.NextRun.Add(task.Interval) | ||
| } |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for catching up after a missed execution window has a potential issue. When calculating cycles, if task.NextRun equals now, the diff is 0 and no cycles are added. However, the subsequent check if task.NextRun.Before(now) will still be false, so the task won't be advanced forward at all. This means the task will execute immediately again on the next iteration. Consider handling the equal case explicitly or ensuring NextRun is always advanced by at least one interval after execution.
| if task.NextRun.Before(now) { | |
| diff := now.Sub(task.NextRun) | |
| cycles := diff / task.Interval | |
| if cycles > 0 { | |
| task.NextRun = task.NextRun.Add(cycles * task.Interval) | |
| } | |
| if task.NextRun.Before(now) { | |
| task.NextRun = task.NextRun.Add(task.Interval) | |
| } | |
| // Ensure NextRun is always moved forward by at least one interval | |
| // when the task is due or overdue (including the equal-to-now case). | |
| if !task.NextRun.After(now) { | |
| diff := now.Sub(task.NextRun) | |
| // Number of full intervals elapsed since NextRun, plus one to | |
| // ensure the new NextRun is strictly after now. | |
| cycles := diff/task.Interval + 1 | |
| task.NextRun = task.NextRun.Add(cycles * task.Interval) |
| // Copy visited map to allow shared dependencies (DAG) while preventing cycles in the current path. | ||
| childVisited := make(map[string]bool, len(visited)) | ||
| for k, v := range visited { | ||
| childVisited[k] = v | ||
| } | ||
|
|
||
| childItem := buildTree(childKey, childVisited) | ||
| if childItem.Key != "" { | ||
| item.DependentItems = append(item.DependentItems, childItem) | ||
| } | ||
| } |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'childVisited' is created by copying the parent's visited map to allow shared dependencies (DAG structure). However, this copy is made for every child iteration, which is inefficient. Since the visited map is meant to track the current path for cycle detection, a better approach would be to add the key before recursion and remove it after, using a single map. The current approach creates O(n²) map copies in the worst case for a deep tree with n nodes.
| // Copy visited map to allow shared dependencies (DAG) while preventing cycles in the current path. | |
| childVisited := make(map[string]bool, len(visited)) | |
| for k, v := range visited { | |
| childVisited[k] = v | |
| } | |
| childItem := buildTree(childKey, childVisited) | |
| if childItem.Key != "" { | |
| item.DependentItems = append(item.DependentItems, childItem) | |
| } | |
| } | |
| // Use a single visited map per root; mark/unmark keys to track the current path. | |
| childItem := buildTree(childKey, visited) | |
| if childItem.Key != "" { | |
| item.DependentItems = append(item.DependentItems, childItem) | |
| } | |
| } | |
| // Remove key from current path to allow shared dependencies (DAG) without false cycles. | |
| delete(visited, key) |
| log.Printf("E! cycle detected in item dependency: %s", key) | ||
| } | ||
| return *ptr |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a cycle is detected, the function returns the pointer value without traversing its children. This means the returned item will have an empty DependentItems slice, which might not reflect the actual structure. If the cycle is part of a larger graph, this could lead to incomplete dependency trees. Consider logging more details about the cycle path or returning a sentinel value to indicate the cycle was detected and handled.
| log.Printf("E! cycle detected in item dependency: %s", key) | |
| } | |
| return *ptr | |
| // Log detailed cycle path for easier debugging. | |
| cyclePath := make([]string, 0, len(visited)+1) | |
| for k := range visited { | |
| cyclePath = append(cyclePath, k) | |
| } | |
| sort.Strings(cyclePath) | |
| cyclePath = append(cyclePath, key) | |
| log.Printf("E! cycle detected in item dependency path: %s", strings.Join(cyclePath, " -> ")) | |
| } | |
| // Return a sentinel value so callers can detect and skip this branch. | |
| return MonitorItem{} |
| "flashcat.cloud/categraf/types" | ||
| ) | ||
|
|
||
| func TestScheduler_CalcScatteredNextRun(t *testing.T) { |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name 'TestScheduler_CalcScatteredNextRun' doesn't match the standard Go naming convention. Since 'calcScatteredNextRun' is a method on ItemScheduler, the test should be named 'TestItemScheduler_CalcScatteredNextRun' or 'TestItemScheduler_calcScatteredNextRun' for consistency with Go testing conventions.
| func TestScheduler_CalcScatteredNextRun(t *testing.T) { | |
| func TestItemScheduler_CalcScatteredNextRun(t *testing.T) { |
| } | ||
| } | ||
|
|
||
| func TestScheduler_HeapOrdering(t *testing.T) { |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name 'TestScheduler_HeapOrdering' should be 'TestItemScheduler_HeapOrdering' to match Go testing conventions, similar to the previous test naming issue.
| func TestScheduler_HeapOrdering(t *testing.T) { | |
| func TestItemScheduler_HeapOrdering(t *testing.T) { |
| s.labelCache.DeleteLabel(sch.Item.Agent, sch.Item.DiscoveryRuleKey, sch.Item.DiscoveryIndex, sch.Item.LabelKey) | ||
| if _, ok := newIdx[id]; !ok { | ||
| // Lost Logic | ||
| if deleteTTL == 0 { // Immediately delete |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison using 'DurationImmediately' has been changed to compare against '0'. This assumes that DurationImmediately was previously defined as 0, but there's no visible definition in the diff. If DurationImmediately was a named constant with semantic meaning (e.g., a special sentinel value), replacing it with the magic number 0 reduces code clarity. Consider keeping the named constant or adding a comment explaining that 0 means immediate action.
| } | ||
| } | ||
|
|
||
| func TestScheduler_UpdateDiscoveredDiff_Logic(t *testing.T) { |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name 'TestScheduler_UpdateDiscoveredDiff_Logic' should be 'TestItemScheduler_UpdateDiscoveredDiff' or 'TestItemScheduler_UpdateDiscoveredDiffLogic' to match Go testing conventions.
| func TestScheduler_UpdateDiscoveredDiff_Logic(t *testing.T) { | |
| func TestItemScheduler_UpdateDiscoveredDiffLogic(t *testing.T) { |
| for i, it := range task.Items { | ||
| if s.itemID(it) == s.itemID(sch.Item) { | ||
| task.Items[i] = newItem | ||
| return |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In updateItemInTask, after finding and updating the matching item, the function returns immediately. However, if there could be duplicate items with the same ID in the task.Items slice, only the first one would be updated. Consider adding documentation to clarify that items are unique within a task, or implement logic to handle potential duplicates if they're possible.
| return | |
| // Do not return here so that all items with the same ID are updated. |
| // Util function defined again to avoid dependency cycle if moved? | ||
| // Actually it was local in this file. |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states this is a "Util function defined again to avoid dependency cycle if moved" but then questions itself with "Actually it was local in this file." This uncertain comment should be clarified or removed. If the function was always local to this file, the comment is confusing. If it's duplicated from elsewhere, that should be stated clearly and ideally the duplication should be resolved by placing the function in a shared utility file.
| // Util function defined again to avoid dependency cycle if moved? | |
| // Actually it was local in this file. | |
| // copyTags returns a shallow copy of the given tags map. |
| go s.runLoop(ctx) | ||
| go s.runMaintainLoop(ctx) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Start method launches goroutines (runLoop and runMaintainLoop) after releasing the mutex, but these goroutines immediately try to acquire the mutex. While this works, there's a subtle race: if Stop() is called immediately after Start() but before the goroutines begin, the goroutines might not see the stopCh closure. A safer pattern would be to check s.running within the goroutines after acquiring the lock for the first time, or to pass stopCh as a parameter to the goroutine functions to capture its value at the time of Start().
…hashing