fix(snmp_zabbix): stabilize SNMP collection order #1337

kongfei605 · 2025-11-20T03:57:35Z

No description provided.

Copilot

Pull Request Overview

This pull request stabilizes SNMP collection ordering by removing concurrent operations that could cause Request ID conflicts when reusing SNMP clients.

Converted multi-OID walk discovery from concurrent to sequential execution to prevent Request ID collisions
Made batch size configurable via MaxRepetitions with fallback to individual collection when batch size is 1
Improved logging by making error logs conditional on DebugMode to reduce log spam

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
inputs/snmp_zabbix/discovery.go	Refactored `performMultiOidWalkDiscovery` to execute SNMP walks sequentially instead of concurrently, preventing Request ID conflicts from client reuse
inputs/snmp_zabbix/collector.go	Made batch size configurable via MaxRepetitions, extracted `processSingleResult` method for code reuse, and made error logging conditional on debug mode to reduce verbosity

Comments suppressed due to low confidence (1)

inputs/snmp_zabbix/collector.go:302

[nitpick] The error message includes the original oid parameter which may have a leading dot, while the error is about a normalized OID lookup. For consistency and clarity in debugging, consider using oidNoDot in the error messages on lines 297 and 302, or indicate that the OID has been normalized.

				results[i].Error = fmt.Errorf("OID not found on device: %s", oid)
			} else {
				results[i].Value = pdu
			}
		} else {
			results[i].Error = fmt.Errorf("no response for OID: %s", oid)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-20T04:04:21Z

inputs/snmp_zabbix/discovery.go

-// 新增函数用于处理 walk[...] 语法
+// performMultiOidWalkDiscovery 处理 walk[...] 语法
+// 改为顺序执行，防止复用 client 导致的 Request ID 错乱
 func (d *DiscoveryEngine) performMultiOidWalkDiscovery(ctx context.Context, client *gosnmp.GoSNMP, discoveryOID string) ([][]gosnmp.SnmpPDU, error) {


The context parameter ctx is accepted but never used in this function. The function performs sequential SNMP walks that could potentially be long-running operations, but there's no mechanism to respect context cancellation. Consider checking ctx.Done() between iterations or passing the context to the SNMP operations if supported.

Copilot · 2025-11-20T04:04:21Z

inputs/snmp_zabbix/discovery.go

 		return nil, fmt.Errorf("no OIDs found in walk[] directive: %s", discoveryOID)
 	}

 	var allPdus [][]gosnmp.SnmpPDU


The variable allPdus is initialized on line 209 but never used before being reassigned in the loop starting at line 229. Consider removing the initialization on line 209 and directly assigning allPdus in the loop, or clarify the intended logic.

Copilot · 2025-11-20T04:04:21Z

inputs/snmp_zabbix/discovery.go

 		}
-		results[res.index] = append(results[res.index], res.pdus)
+
+		results[i] = append(results[i], pdus)


The logic of wrapping pdus in another slice with append(results[i], pdus) and then unwrapping it later (line 229-233) is unnecessarily complex. Since each iteration produces one []gosnmp.SnmpPDU, you could simplify by storing pdus directly: results[i] = pdus on line 226, and then allPdus = append(allPdus, pduGroup) on line 231.

Copilot · 2025-11-20T04:04:22Z

inputs/snmp_zabbix/collector.go

 	var wg sync.WaitGroup
-	resultChan := make(chan CollectionResult, len(items))
+	// 使用带缓冲的 channel 防止阻塞
+	resultChan := make(chan CollectionResult, len(items)+10)


[nitpick] The buffer size calculation len(items)+10 appears arbitrary. If this is meant to prevent blocking, consider using a more justified value or adding a comment explaining why "+10" is appropriate for this use case. Alternatively, consider making this configurable or calculating based on the number of agents.

inputs/snmp_zabbix/collector.go

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-20T04:33:20Z

inputs/snmp_zabbix/collector.go

+	// 批量大小
+	batchSize := c.config.MaxRepetitions
+	if batchSize <= 0 {
+		batchSize = 10 // 默认值
+	}


MaxRepetitions is an SNMP protocol parameter that specifies how many variable bindings should be returned per GetBulk request, not how many OIDs should be batched together. Using it as the batch size for collector batching is a semantic mismatch.

The original hardcoded value of 60 may have been chosen specifically for batching operations. If configurability is desired, consider:

Adding a separate BatchSize configuration parameter for collector batching

Using a different default value if MaxRepetitions was intended for a different purpose

Documenting why MaxRepetitions is being repurposed for batching if this is intentional

Note: The default value of 10 when MaxRepetitions <= 0 is significantly smaller than the original 60, which could impact collection performance.

Copilot · 2025-11-20T04:33:20Z

inputs/snmp_zabbix/discovery.go

-		if res.err != nil {
-			// 如果任何一个 walk 失败，整个发现都失败
-			return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oidStrings[res.index], res.err)
+		// 这里不需要 acquire 锁，因为上层 performSNMPDiscovery 已经持有锁了


[nitpick] The comment states "这里不需要 acquire 锁，因为上层 performSNMPDiscovery 已经持有锁了" (no need to acquire lock here because the upper layer performSNMPDiscovery already holds the lock).

While this is factually correct and explains why no lock acquisition is needed here, this comment is documenting the absence of code that was never present in the first place (even in the concurrent version that was removed). The comment might be more confusing than helpful, as it could make readers wonder if a lock was considered or removed.

Consider removing this comment or rephrasing it to focus on the sequential execution model, e.g., "顺序执行，复用上层的锁" (sequential execution, reusing the upper-layer lock).

Suggested change

// 这里不需要 acquire 锁，因为上层 performSNMPDiscovery 已经持有锁了

// 顺序执行，复用上层的锁

Copilot · 2025-11-20T04:33:21Z

inputs/snmp_zabbix/discovery.go

 	for i, oidStr := range oidStrings {
 		oid := strings.TrimSpace(oidStr)
 		if err := d.validateOID(oid); err != nil {
 			return nil, fmt.Errorf("invalid OID '%s' in walk[]: %w", oid, err)
 		}

-		wg.Add(1)
-		go func(oidToWalk string, idx int) {
-			defer wg.Done()
-			pdus, err := client.BulkWalkAll(oidToWalk)
-			pduChan <- struct {
-				pdus  []gosnmp.SnmpPDU
-				index int
-				err   error
-			}{pdus, idx, err}
-		}(oid, i)
-	}
-
-	wg.Wait()
-	close(pduChan)
-
-	// 按顺序收集结果以匹配预处理参数
-	results := make([][][]gosnmp.SnmpPDU, len(oidStrings))
-	for res := range pduChan {
-		if res.err != nil {
-			// 如果任何一个 walk 失败，整个发现都失败
-			return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oidStrings[res.index], res.err)
+		// 这里不需要 acquire 锁，因为上层 performSNMPDiscovery 已经持有锁了
+		pdus, err := client.BulkWalkAll(oid)
+		if err != nil {
+			return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oid, err)
 		}
-		results[res.index] = append(results[res.index], res.pdus)
+
+		results[i] = append(results[i], pdus)
 	}


The ctx context.Context parameter is not used in this function. Since the function performs sequential SNMP operations that could take a significant amount of time, it should respect the context for cancellation.

Consider adding context cancellation checks in the loop, e.g.:

for i, oidStr := range oidStrings { select { case <-ctx.Done(): return nil, fmt.Errorf("SNMP walk was canceled or timed out: %w", ctx.Err()) default: } // ... existing code }

Note that performZabbixDependentDiscovery already implements proper context handling (lines 260-262), so this function should follow the same pattern for consistency.

Copilot · 2025-11-20T04:33:21Z

inputs/snmp_zabbix/collector.go

+		// 尝试多种形式的OID匹配（有没有前导点）
+		oidNoDot := strings.TrimPrefix(oid, ".")
+
+		if pdu, ok := valueMap[oidNoDot]; ok {


The comment states "尝试多种形式的OID匹配（有没有前导点）" (try multiple forms of OID matching with/without leading dot), but the implementation only tries one form - the OID without a leading dot.

The code normalizes both the response OIDs (line 280) and request OIDs (line 289) by removing leading dots, then only checks the normalized form. While this approach should work in most cases, it doesn't actually try "multiple forms" as the comment suggests.

Consider either:

Updating the comment to reflect that both sides are normalized, or

Implementing a fallback to try the original oid if oidNoDot doesn't match (though this may not be necessary if normalization is consistent)

Copilot · 2025-11-20T04:33:21Z

inputs/snmp_zabbix/discovery.go

+	results := make([][][]gosnmp.SnmpPDU, len(oidStrings))

 	for i, oidStr := range oidStrings {
 		oid := strings.TrimSpace(oidStr)
 		if err := d.validateOID(oid); err != nil {
 			return nil, fmt.Errorf("invalid OID '%s' in walk[]: %w", oid, err)
 		}

-		wg.Add(1)
-		go func(oidToWalk string, idx int) {
-			defer wg.Done()
-			pdus, err := client.BulkWalkAll(oidToWalk)
-			pduChan <- struct {
-				pdus  []gosnmp.SnmpPDU
-				index int
-				err   error
-			}{pdus, idx, err}
-		}(oid, i)
-	}
-
-	wg.Wait()
-	close(pduChan)
-
-	// 按顺序收集结果以匹配预处理参数
-	results := make([][][]gosnmp.SnmpPDU, len(oidStrings))
-	for res := range pduChan {
-		if res.err != nil {
-			// 如果任何一个 walk 失败，整个发现都失败
-			return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oidStrings[res.index], res.err)
+		// 这里不需要 acquire 锁，因为上层 performSNMPDiscovery 已经持有锁了
+		pdus, err := client.BulkWalkAll(oid)
+		if err != nil {
+			return nil, fmt.Errorf("SNMP walk failed for OID %s: %w", oid, err)
 		}
-		results[res.index] = append(results[res.index], res.pdus)
+
+		results[i] = append(results[i], pdus)


The intermediate array structure results := make([][][]gosnmp.SnmpPDU, len(oidStrings)) combined with results[i] = append(results[i], pdus) creates a three-dimensional array where each element is a slice containing exactly one []gosnmp.SnmpPDU.

This can be simplified to:

results := make([][]gosnmp.SnmpPDU, len(oidStrings)) // ... results[i] = pdus

Then the final loop becomes:

allPdus = results return allPdus, nil

This simplification makes the code more maintainable and easier to understand, since the extra nesting level serves no purpose in the sequential execution model.

Copilot · 2025-11-20T04:33:22Z

inputs/snmp_zabbix/collector.go

 	var wg sync.WaitGroup
-	resultChan := make(chan CollectionResult, len(items))
+	// 使用带缓冲的 channel 防止阻塞
+	resultChan := make(chan CollectionResult, len(items)+10)


[nitpick] The buffer size is set to len(items)+10, but the rationale for adding exactly 10 extra slots is unclear.

Since the number of results that will be sent to the channel should equal the number of items (one CollectionResult per item), the buffer size could simply be len(items). The additional 10 slots don't provide a clear benefit and make the buffer sizing logic arbitrary.

Consider either:

Using exactly len(items) as the buffer size, or

Adding a comment explaining why the extra 10 slots are needed (e.g., if there's a specific edge case being handled)

Suggested change

resultChan := make(chan CollectionResult, len(items)+10)

resultChan := make(chan CollectionResult, len(items))

fix(snmp_zabbix): stabilize SNMP collection order

008a124

Copilot AI review requested due to automatic review settings November 20, 2025 03:57

Copilot started reviewing on behalf of kongfei605 November 20, 2025 03:58 View session

Copilot finished reviewing on behalf of kongfei605 November 20, 2025 04:00

Copilot AI reviewed Nov 20, 2025

View reviewed changes

chore(snmp_zbx) drop not used code

05584eb

Co-authored-by: Copilot <[email protected]>

Copilot AI review requested due to automatic review settings November 20, 2025 04:26

Copilot started reviewing on behalf of kongfei605 November 20, 2025 04:26 View session

kongfei605 merged commit 19636ca into flashcatcloud:main Nov 20, 2025
6 of 7 checks passed

kongfei605 deleted the snmp_zbx_fix branch November 20, 2025 04:30

Copilot finished reviewing on behalf of kongfei605 November 20, 2025 04:30

Copilot AI reviewed Nov 20, 2025

View reviewed changes

	// 这里不需要 acquire 锁，因为上层 performSNMPDiscovery 已经持有锁了
	// 顺序执行，复用上层的锁

	resultChan := make(chan CollectionResult, len(items)+10)
	resultChan := make(chan CollectionResult, len(items))

fix(snmp_zabbix): stabilize SNMP collection order #1337

fix(snmp_zabbix): stabilize SNMP collection order #1337

Uh oh!

Conversation

kongfei605 commented Nov 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant