Skip to content

Conversation

@kongfei605
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings November 20, 2025 03:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request stabilizes SNMP collection ordering by removing concurrent operations that could cause Request ID conflicts when reusing SNMP clients.

  • Converted multi-OID walk discovery from concurrent to sequential execution to prevent Request ID collisions
  • Made batch size configurable via MaxRepetitions with fallback to individual collection when batch size is 1
  • Improved logging by making error logs conditional on DebugMode to reduce log spam

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
inputs/snmp_zabbix/discovery.go Refactored performMultiOidWalkDiscovery to execute SNMP walks sequentially instead of concurrently, preventing Request ID conflicts from client reuse
inputs/snmp_zabbix/collector.go Made batch size configurable via MaxRepetitions, extracted processSingleResult method for code reuse, and made error logging conditional on debug mode to reduce verbosity
Comments suppressed due to low confidence (1)

inputs/snmp_zabbix/collector.go:302

  • [nitpick] The error message includes the original oid parameter which may have a leading dot, while the error is about a normalized OID lookup. For consistency and clarity in debugging, consider using oidNoDot in the error messages on lines 297 and 302, or indicate that the OID has been normalized.
				results[i].Error = fmt.Errorf("OID not found on device: %s", oid)
			} else {
				results[i].Value = pdu
			}
		} else {
			results[i].Error = fmt.Errorf("no response for OID: %s", oid)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// 新增函数用于处理 walk[...] 语法
// performMultiOidWalkDiscovery 处理 walk[...] 语法
// 改为顺序执行,防止复用 client 导致的 Request ID 错乱
func (d *DiscoveryEngine) performMultiOidWalkDiscovery(ctx context.Context, client *gosnmp.GoSNMP, discoveryOID string) ([][]gosnmp.SnmpPDU, error) {
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context parameter ctx is accepted but never used in this function. The function performs sequential SNMP walks that could potentially be long-running operations, but there's no mechanism to respect context cancellation. Consider checking ctx.Done() between iterations or passing the context to the SNMP operations if supported.

Copilot uses AI. Check for mistakes.
return nil, fmt.Errorf("no OIDs found in walk[] directive: %s", discoveryOID)
}

var allPdus [][]gosnmp.SnmpPDU
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable allPdus is initialized on line 209 but never used before being reassigned in the loop starting at line 229. Consider removing the initialization on line 209 and directly assigning allPdus in the loop, or clarify the intended logic.

Copilot uses AI. Check for mistakes.
}
results[res.index] = append(results[res.index], res.pdus)

results[i] = append(results[i], pdus)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of wrapping pdus in another slice with append(results[i], pdus) and then unwrapping it later (line 229-233) is unnecessarily complex. Since each iteration produces one []gosnmp.SnmpPDU, you could simplify by storing pdus directly: results[i] = pdus on line 226, and then allPdus = append(allPdus, pduGroup) on line 231.

Copilot uses AI. Check for mistakes.
var wg sync.WaitGroup
resultChan := make(chan CollectionResult, len(items))
// 使用带缓冲的 channel 防止阻塞
resultChan := make(chan CollectionResult, len(items)+10)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The buffer size calculation len(items)+10 appears arbitrary. If this is meant to prevent blocking, consider using a more justified value or adding a comment explaining why "+10" is appropriate for this use case. Alternatively, consider making this configurable or calculating based on the number of agents.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings November 20, 2025 04:26
@kongfei605 kongfei605 merged commit 19636ca into flashcatcloud:main Nov 20, 2025
6 of 7 checks passed
@kongfei605 kongfei605 deleted the snmp_zbx_fix branch November 20, 2025 04:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +123 to +127
// 批量大小
batchSize := c.config.MaxRepetitions
if batchSize <= 0 {
batchSize = 10 // 默认值
}
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxRepetitions is an SNMP protocol parameter that specifies how many variable bindings should be returned per GetBulk request, not how many OIDs should be batched together. Using it as the batch size for collector batching is a semantic mismatch.

The original hardcoded value of 60 may have been chosen specifically for batching operations. If configurability is desired, consider:

  1. Adding a separate BatchSize configuration parameter for collector batching
  2. Using a different default value if MaxRepetitions was intended for a different purpose
  3. Documenting why MaxRepetitions is being repurposed for batching if this is intentional

Note: The default value of 10 when MaxRepetitions <= 0 is significantly smaller than the original 60, which could impact collection performance.

Copilot uses AI. Check for mistakes.
if res.err != nil {
// 如果任何一个 walk 失败,整个发现都失败
return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oidStrings[res.index], res.err)
// 这里不需要 acquire 锁,因为上层 performSNMPDiscovery 已经持有锁了
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment states "这里不需要 acquire 锁,因为上层 performSNMPDiscovery 已经持有锁了" (no need to acquire lock here because the upper layer performSNMPDiscovery already holds the lock).

While this is factually correct and explains why no lock acquisition is needed here, this comment is documenting the absence of code that was never present in the first place (even in the concurrent version that was removed). The comment might be more confusing than helpful, as it could make readers wonder if a lock was considered or removed.

Consider removing this comment or rephrasing it to focus on the sequential execution model, e.g., "顺序执行,复用上层的锁" (sequential execution, reusing the upper-layer lock).

Suggested change
// 这里不需要 acquire 锁,因为上层 performSNMPDiscovery 已经持有锁了
// 顺序执行,复用上层的锁

Copilot uses AI. Check for mistakes.
Comment on lines 214 to 227
for i, oidStr := range oidStrings {
oid := strings.TrimSpace(oidStr)
if err := d.validateOID(oid); err != nil {
return nil, fmt.Errorf("invalid OID '%s' in walk[]: %w", oid, err)
}

wg.Add(1)
go func(oidToWalk string, idx int) {
defer wg.Done()
pdus, err := client.BulkWalkAll(oidToWalk)
pduChan <- struct {
pdus []gosnmp.SnmpPDU
index int
err error
}{pdus, idx, err}
}(oid, i)
}

wg.Wait()
close(pduChan)

// 按顺序收集结果以匹配预处理参数
results := make([][][]gosnmp.SnmpPDU, len(oidStrings))
for res := range pduChan {
if res.err != nil {
// 如果任何一个 walk 失败,整个发现都失败
return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oidStrings[res.index], res.err)
// 这里不需要 acquire 锁,因为上层 performSNMPDiscovery 已经持有锁了
pdus, err := client.BulkWalkAll(oid)
if err != nil {
return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oid, err)
}
results[res.index] = append(results[res.index], res.pdus)

results[i] = append(results[i], pdus)
}
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ctx context.Context parameter is not used in this function. Since the function performs sequential SNMP operations that could take a significant amount of time, it should respect the context for cancellation.

Consider adding context cancellation checks in the loop, e.g.:

for i, oidStr := range oidStrings {
    select {
    case <-ctx.Done():
        return nil, fmt.Errorf("SNMP walk was canceled or timed out: %w", ctx.Err())
    default:
    }
    // ... existing code
}

Note that performZabbixDependentDiscovery already implements proper context handling (lines 260-262), so this function should follow the same pattern for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +288 to +291
// 尝试多种形式的OID匹配(有没有前导点)
oidNoDot := strings.TrimPrefix(oid, ".")

if pdu, ok := valueMap[oidNoDot]; ok {
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "尝试多种形式的OID匹配(有没有前导点)" (try multiple forms of OID matching with/without leading dot), but the implementation only tries one form - the OID without a leading dot.

The code normalizes both the response OIDs (line 280) and request OIDs (line 289) by removing leading dots, then only checks the normalized form. While this approach should work in most cases, it doesn't actually try "multiple forms" as the comment suggests.

Consider either:

  1. Updating the comment to reflect that both sides are normalized, or
  2. Implementing a fallback to try the original oid if oidNoDot doesn't match (though this may not be necessary if normalization is consistent)

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +226
results := make([][][]gosnmp.SnmpPDU, len(oidStrings))

for i, oidStr := range oidStrings {
oid := strings.TrimSpace(oidStr)
if err := d.validateOID(oid); err != nil {
return nil, fmt.Errorf("invalid OID '%s' in walk[]: %w", oid, err)
}

wg.Add(1)
go func(oidToWalk string, idx int) {
defer wg.Done()
pdus, err := client.BulkWalkAll(oidToWalk)
pduChan <- struct {
pdus []gosnmp.SnmpPDU
index int
err error
}{pdus, idx, err}
}(oid, i)
}

wg.Wait()
close(pduChan)

// 按顺序收集结果以匹配预处理参数
results := make([][][]gosnmp.SnmpPDU, len(oidStrings))
for res := range pduChan {
if res.err != nil {
// 如果任何一个 walk 失败,整个发现都失败
return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oidStrings[res.index], res.err)
// 这里不需要 acquire 锁,因为上层 performSNMPDiscovery 已经持有锁了
pdus, err := client.BulkWalkAll(oid)
if err != nil {
return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oid, err)
}
results[res.index] = append(results[res.index], res.pdus)

results[i] = append(results[i], pdus)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intermediate array structure results := make([][][]gosnmp.SnmpPDU, len(oidStrings)) combined with results[i] = append(results[i], pdus) creates a three-dimensional array where each element is a slice containing exactly one []gosnmp.SnmpPDU.

This can be simplified to:

results := make([][]gosnmp.SnmpPDU, len(oidStrings))
// ...
results[i] = pdus

Then the final loop becomes:

allPdus = results
return allPdus, nil

This simplification makes the code more maintainable and easier to understand, since the extra nesting level serves no purpose in the sequential execution model.

Copilot uses AI. Check for mistakes.
var wg sync.WaitGroup
resultChan := make(chan CollectionResult, len(items))
// 使用带缓冲的 channel 防止阻塞
resultChan := make(chan CollectionResult, len(items)+10)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The buffer size is set to len(items)+10, but the rationale for adding exactly 10 extra slots is unclear.

Since the number of results that will be sent to the channel should equal the number of items (one CollectionResult per item), the buffer size could simply be len(items). The additional 10 slots don't provide a clear benefit and make the buffer sizing logic arbitrary.

Consider either:

  1. Using exactly len(items) as the buffer size, or
  2. Adding a comment explaining why the extra 10 slots are needed (e.g., if there's a specific edge case being handled)
Suggested change
resultChan := make(chan CollectionResult, len(items)+10)
resultChan := make(chan CollectionResult, len(items))

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant