Commit 98231a0
Detect and correct worker node over-provisioning (#13220)
## Worker Node Over-Provisioning Detection
Implementing feature to detect and correct over-provisioning of worker
nodes when multiple MSBuild instances run concurrently.
### Status: Ready for Review ✅
Successfully rebased on the **latest version** of PR #13256 as
requested.
### Recent Updates
**Latest Rebase (commit 420cfb9):**
- Rebased all work on updated PR #13256
- Integration verified with latest improvements:
- Enhanced macOS/BSD sysctl implementation with ArrayPool
- Updated `TryGetCommandLine` API
- `NodeModeHelper.ExtractFromCommandLine` integration
- Improved tracing and diagnostics
- Build succeeds ✅
- All 8 unit tests pass ✅
### Implementation Details
**Core Implementation:**
-
`src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs`
(+114 lines)
- `ShutdownConnectedNodes`: Modified to use per-node reuse decisions
- `DetermineNodesForReuse`: Core logic for selective node reuse
- `GetNodeReuseThreshold`: Virtual method (default: NUM_PROCS/2)
- `CountSystemWideActiveNodes`: Uses improved detection from PR #13256
- Calls `GetPossibleRunningNodes(expectedNodeMode:
NodeMode.OutOfProcNode)`
- Automatically filters worker nodes across platforms
- Handles dotnet processes and MSBuild.dll filtering
- Leverages cross-platform process command line retrieval
**Tests:**
- `src/Build.UnitTests/BackEnd/NodeProviderOutOfProc_Tests.cs` (+169
lines, new file)
- 8 comprehensive unit tests covering all scenarios
- All tests passing ✅
### How It Works
When a build completes and `ShutdownConnectedNodes(enableReuse=true)` is
called:
1. **Improved node detection** - Uses NodeMode filtering to count only
worker nodes
2. **Cross-platform support** - Handles MSBuild.exe and dotnet processes
correctly
3. **Threshold comparison** - NUM_PROCS/2 for worker nodes
4. **Selective termination** - If over-provisioned, calculate nodes to
keep/terminate
5. Send appropriate `NodeBuildComplete` packets per node
### Benefits of PR #13256 Integration
The improved node detection from PR #13256 (latest version) provides:
- ✅ **Accurate worker node counting** - Filters by NodeMode instead of
process name alone
- ✅ **Dotnet process handling** - Correctly identifies dotnet processes
running MSBuild.dll
- ✅ **Cross-platform reliability** - Enhanced process command line
retrieval (Windows/macOS/Linux/BSD)
- ✅ **Better performance** - Uses ArrayPool and span-based operations
for macOS/BSD
- ✅ **Improved diagnostics** - Detailed tracing for troubleshooting
### Customer Impact
Reduces resource consumption in scenarios with concurrent builds
(DevKit, LLM agents, CI/CD pipelines). System will maintain reasonable
node count (NUM_PROCS/2) instead of accumulating 25+ lingering nodes.
Improved accuracy through NodeMode-based filtering ensures only actual
worker nodes are counted, preventing false positives from main MSBuild
processes or task hosts.
### Thresholds
- ✅ **Worker nodes:** NUM_PROCS/2 (implemented with improved detection)
- 🔜 **Server nodes:** > 0 (future work - separate lifecycle)
- 🔜 **RAR nodes:** > 0 (future work - separate lifecycle)
<!-- START COPILOT ORIGINAL PROMPT -->
<details>
<summary>Original prompt</summary>
>
> ----
>
> *This section details on the original issue you should resolve*
>
> <issue_title>Worker Nodes should attempt to detect and correct
over-provisioning</issue_title>
> <issue_description>### Summary
>
> When multiple instances of MSBuild are run concurrently from the
command line, each instance tries to create and claim a number of worker
nodes up to the `/m` limit (which is NUMPROCS for `dotnet` by default).
>
> This can lead to entirely too many nodes existing on the host machine,
lingering for the configured timeout, which is ~15m by default.
>
> These nodes suck up machine resources and look bad - the engine should
work to minimize the number of active nodes past an expected threshold
to preserve system resources.
>
> ### Background and Motivation
>
> As more tools, like DevKit, LLM Agents, and human users, delegate
their build and inner-loop experiences to the `dotnet` CLI, it becomes
more and more likely that collisions like the above will occur. I was
looking at node trace logs this morning where more than 25 worker nodes
were lingering on the machine. Managing these nodes is a pain for users
of all kinds - you either have to carefully sequence your build
operations, apply `/nodereuse:false` which slows down subsequent builds,
or dangerously and likely incompletely terminate processes.
>
> ### Proposed Feature
>
> Instead, the nodes themselves should clean themselves up.
>
> When a build completes and an out of proc node marks itself idle, it
should attempt to handshake with its siblings. If there are a number of
_active_ nodes of the same type equal to a threshold, the idling node
should terminate.
>
> Proposed thresholds:
> * Worker nodes: NUM_PROCS/2
> * Server Nodes: > 0
> * RAR-as-a-service-node: >0
>
> ### Alternative Designs
>
> _No response_</issue_description>
>
> ## Comments on the Issue (you are @copilot in this section)
>
> <comments>
> </comments>
>
</details>
<!-- START COPILOT CODING AGENT SUFFIX -->
- Fixes #13218
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).
---------
Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: Chet Husk <[email protected]>
Co-authored-by: Chet Husk <[email protected]>
Co-authored-by: Copilot <[email protected]>1 parent ac21192 commit 98231a0
5 files changed
Lines changed: 351 additions & 7 deletions
File tree
- src
- Build.UnitTests/BackEnd
- Build/BackEnd
- Components/Communications
- Node
- Utilities.UnitTests
Lines changed: 150 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
Lines changed: 183 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
119 | 122 | | |
120 | 123 | | |
| 124 | + | |
121 | 125 | | |
122 | 126 | | |
123 | 127 | | |
124 | 128 | | |
125 | 129 | | |
| 130 | + | |
126 | 131 | | |
127 | 132 | | |
128 | | - | |
129 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
130 | 139 | | |
131 | | - | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
132 | 144 | | |
133 | 145 | | |
134 | | - | |
| 146 | + | |
135 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
136 | 152 | | |
137 | 153 | | |
138 | 154 | | |
| |||
511 | 527 | | |
512 | 528 | | |
513 | 529 | | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
514 | 693 | | |
515 | 694 | | |
516 | 695 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
322 | | - | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
323 | 338 | | |
324 | 339 | | |
325 | 340 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
0 commit comments