Add quality metrics for `dotnetup` by nagilson · Pull Request #52792 · dotnet/sdk

nagilson · 2026-02-02T22:30:44Z

Resolves #50609
(allow #52717 to be merged first - this pr needed code from that branch so we need to merge it back in)

Add OpenTel Data for dotnetup

Summary

This PR introduces telemetry for the dotnetup CLI tool using OpenTelemetry with Azure Monitor. The telemetry provides actionable insights into installation success rates, user behavior patterns, and error analysis while maintaining PII rules.

The current connection key uses my own local app insights because we'll switch to the SDK CLI one once that work has been done by someone else in the team. That shouldn't be used in production, but since this is a dev branch we are not yet releasing, seems like that's ok. I've filed an issue to track: #52785

https://aka.ms/dotnetup-telemetry points to the release/dnup branch for the doc for now, but eventually it'd go on ms learn.
#52784

Key Features

Telemetry Infrastructure

OpenTelemetry + Azure Monitor: Integrated Azure.Monitor.OpenTelemetry.Exporter for telemetry collection
Activity-based tracing: Commands and operations tracked as OpenTelemetry activities with structured tags
Configurable: Telemetry can be disabled via DOTNET_CLI_TELEMETRY_OPTOUT environment variable

Error Categorization System

Product Errors: Bugs, crashes, server issues - count against quality metrics
User Errors: Invalid input, permissions, disk full, network issues - tracked separately for UX improvement
17 specific error codes including VersionNotFound, ManifestFetchFailed, HashMismatch, ArchiveCorrupted, etc.
Errors are actually thrown with proper codes throughout the codebase (not just defined)

Success Rate Metrics

Success rate calculation excludes user errors to measure true product quality
Tracks install.result (installed vs already_installed) for accurate installation counts
Version comparison between latest and prior releases

User Behavior Tracking (PII-Safe)

install.path_source: Where install path came from (explicit, global_json, default, etc.)
install.path_type: Classification of path (system_programfiles, user_profile, local_appdata) - not actual paths
install.has_global_json: Whether project has global.json
install.existing_install_type: Admin/User/none for existing installations
sdk.request_source: How SDK version was specified (explicit, default-latest, default-globaljson)
sdk.requested: Sanitized version string

PII Protection (Critical)

VersionSanitizer: All user-provided version strings are sanitized before telemetry
- Known safe patterns pass through (e.g., "9.0", "latest", "9.0.100-preview.1")
- Unknown patterns replaced with "invalid"
No raw exception messages: SetStatus() uses error type, not ex.Message
No RecordException(): Full exception objects not recorded (contain paths/PII)
Win32 errors: Use error codes (win32_error_5) instead of messages that may contain paths
Install paths: Classified by type, actual paths never recorded

Azure Workbook Dashboard

Success rate overview with version comparison
Command usage metrics and trends
Platform/environment breakdown (OS, architecture, CI vs interactive)
SDK installation analytics (most installed versions, request sources)
Separate sections for Product Errors vs User Errors (UX opportunities)
Performance percentiles (P50/P90/P99)
Daily active users tracking

Example Data:

^ note some of this data is wrong / when the code was incorrect

Files Changed

New Files

Telemetry/DotnetupTelemetry.cs - Main telemetry singleton
Telemetry/ErrorCodeMapper.cs - Exception to error info mapping with categorization
Telemetry/VersionSanitizer.cs - PII-safe version string sanitization
Telemetry/dotnetup-workbook.json - Azure Workbook dashboard definition

Modified Files

CommandBase.cs - Template method for command telemetry
SdkInstallCommand.cs - Comprehensive install behavior tracking
InstallerOrchestratorSingleton.cs - Returns InstallResult, proper error throwing
NonUpdatingProgressTarget.cs / SpectreProgressTarget.cs - Operation-level telemetry
DotnetInstallException.cs - Extended error codes
DotnetArchiveExtractor.cs, DotnetArchiveDownloader.cs, ReleaseManifest.cs - Error code usage

Follow-up Items

Adapt telemetry notice for logging: The telemetry notice displayed to users should also be written to logs for visibility and debugging purposes
Create aka.ms URL: Need to create aka.ms/dotnetup-telemetry (or similar) pointing to the telemetry documentation in the release/dnup branch of dotnet/sdk repository
Documentation: Add telemetry documentation explaining what data is collected and how to opt out

Testing

Unit tests updated for error code categorization (17 error codes tested)
Manual testing of telemetry data in Azure Application Insights
Verified PII sanitization with various user inputs

Telemetry Notice

Users will see a telemetry notice on first run. The notice should link to documentation (via aka.ms redirect) that explains:

What data is collected
How data is used (improving dotnetup reliability and UX)
How to opt out (DOTNET_CLI_TELEMETRY_OPTOUT=1)

we should investigate if the error mapping can be outsourced as it seems silly we need to implement this ourselves

… is wrong failures some of the categories may be incorrect, but this is a good starting point

I also initially included the sha but I want to be able to sort by error an dont have to parse out the sha which should be mappable to /from the version. Still, I kept the outsource of the build sha to a separate file bc I liked that isolated shareable pattern.

- Add llm detection - first run disable env var - stderr over stdout

Based on https://learn.microsoft.com/en-us/dotnet/core/tools/telemetry?tabs=dotnet10

consolidate error logic code

Read more at dotnet#52789

nagilson · 2026-02-20T18:07:42Z

The Mac failures seem to be a network issue. I don't think this is related as its happening on other branches too.

dsplaisted

I've made it through reviewing the dotnetup project. I'll go ahead and submit my comments so far.

Remaining to review are the installation library and the tests.

src/Installer/dotnetup/Telemetry/ErrorCodeMapper.cs

src/Installer/dotnetup/CommandBase.cs

src/Installer/dotnetup/Telemetry/BuildInfo.cs

src/Installer/dotnetup/Telemetry/TelemetryEventData.cs

src/Installer/dotnetup/DotnetupSharedManifest.cs

src/Installer/dotnetup/InstallerOrchestratorSingleton.cs

src/Installer/Microsoft.Dotnet.Installation/IProgressTarget.cs

…etup-telem-otl

dsplaisted · 2026-02-22T12:10:37Z

@nagilson It looks like the comments that seemed to have gotten lost re-appeared when I submitted the review. So you got a bunch of more or less duplicate feedback. Sorry about that! :-)

nagilson · 2026-02-23T18:42:44Z

@nagilson It looks like the comments that seemed to have gotten lost re-appeared when I submitted the review. So you got a bunch of more or less duplicate feedback. Sorry about that! :-)

That's ok, thanks 😁 Sometimes feedback can be good enough it is worth repeating

This reverts the bad change in commit a857a84.

nagilson · 2026-02-23T19:34:02Z

I've addressed the feedback, thanks!

nagilson · 2026-02-23T21:28:53Z

Failures are in the SDK repo, w.r.t to lack of the new 'humanizer' package

avg sum query correction split runtime and sdk versions into 2 graphs add graph for command success rate over time remove unuseful graph

src/Installer/dotnetup/docs/dotnetup-telemetry.md

src/Installer/dotnetup/DotnetupPaths.cs

src/Installer/dotnetup/Telemetry/ErrorCategoryClassifier.cs

src/Installer/dotnetup/Telemetry/ErrorCodeMapper.cs

src/Layout/redist/targets/Crossgen.targets

test/dotnetup.Tests/ChannelVersionResolverTests.cs

test/dotnetup.Tests/ErrorCodeMapperTests.cs

test/dotnetup.Tests/InfoCommandTests.cs

src/Installer/Microsoft.Dotnet.Installation/IProgressTarget.cs

…vs user errors

Co-authored-by: Daniel Plaisted <[email protected]>

nagilson added 28 commits January 29, 2026 16:21

add telemetry - initial phase

309aea6

progress reporters also report telemetry for larger tasks

b1db5c0

--info has telemetry and use custom App I for now

6a1e7c4

failures should be properly recorded

b62dd81

catch more detail than just 'exception' for failure

91e7ad1

share accepted channel values

4335250

specific error for invalid versions

19942d9

add version sanitization tests

dc384ec

use slnf so tests are found by test explorer in code

b95c06f

include more specific error details + dev tag for telem

fb9e749

we should investigate if the error mapping can be outsourced as it seems silly we need to implement this ourselves

first run notice + library hook guidance + tests

6689644

base implementation - user error/uncontrolled failure vs our product…

dc49cd3

… is wrong failures some of the categories may be incorrect, but this is a good starting point

consider more failures product failures

28d30a1

initial telemetry dashboard

ff0c333

throw some more specific error types

1d88b21

collect further insights that will drive decisions

d47f984

error categories should be presentt

1105401

filter metric should be correct

c92b39c

don't track data we don't need or want

6d0ec37

Consolidate logic which generates paths for local dotnetup storage.

1977c58

Align with CLI existing code for telemetry

0941794

- Add llm detection - first run disable env var - stderr over stdout

Add telemetry notice document

a237d49

Based on https://learn.microsoft.com/en-us/dotnet/core/tools/telemetry?tabs=dotnet10

PR Feedback round 1

60ac35e

consolidate error logic code

Demo project for libraries to attach to dotnetup

765cc87

url sanitization

7d825d6

Don't show entire stack trace + better version err

3acca2f

Don't fail with lock error

378bb8e

Read more at dotnet#52789

nagilson marked this pull request as ready for review February 2, 2026 22:32

Copilot AI review requested due to automatic review settings February 2, 2026 22:32

nagilson requested a review from dsplaisted February 20, 2026 18:07

dsplaisted reviewed Feb 20, 2026

View reviewed changes

Merge remote-tracking branch 'origin/release/dnup' into nagilson-dotn…

04b4dbf

…etup-telem-otl

nagilson mentioned this pull request Feb 21, 2026

Add design for dotnetup installation tracking #52834

Open

nagilson added 3 commits February 20, 2026 16:14

pr feedback round 1 - simpler changes

f9078bf

hard code tags, consolidate mapping

964f167

simplify stack trace collection

9018884

nagilson added 2 commits February 23, 2026 11:26

PR Feedback - Reduce exception parsing, fix workbook, error telemetry

a857a84

restore-toolset merge fix

760c689

This reverts the bad change in commit a857a84.

nagilson requested a review from dsplaisted February 23, 2026 19:33

convert to else if chain for clearer code

705bea0

nagilson added 2 commits February 23, 2026 13:39

Instruct on how to run dotnetup for telemetry testing

1f3d599

workbook improvements

a2dd601

avg sum query correction split runtime and sdk versions into 2 graphs add graph for command success rate over time remove unuseful graph

dsplaisted reviewed Feb 23, 2026

View reviewed changes

nagilson and others added 3 commits February 23, 2026 14:20

catch unauthorized exceptions and handle them differently as product …

2d3273b

…vs user errors

improve telemetry notice

0a545aa

Co-authored-by: Daniel Plaisted <[email protected]>

Simplify prerelease version chk

9e280f5

Co-authored-by: Daniel Plaisted <[email protected]>

dsplaisted approved these changes Feb 23, 2026

View reviewed changes

nagilson added 4 commits February 23, 2026 15:33

PR Feedback - clean up unused, shared code

290ffb6

fix merge

bc12fe4

bug fix for preview version with extra '.'

dba6bff

Prerelease version validation fix

db052de

nagilson enabled auto-merge February 24, 2026 00:07

nagilson disabled auto-merge February 24, 2026 00:47

nagilson merged commit 9b24fc9 into dotnet:release/dnup Feb 24, 2026
23 of 32 checks passed

Comments

Conversation

nagilson commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add OpenTel Data for dotnetup

Summary

Key Features

Telemetry Infrastructure

Error Categorization System

Success Rate Metrics

User Behavior Tracking (PII-Safe)

PII Protection (Critical)

Azure Workbook Dashboard

Files Changed

New Files

Modified Files

Follow-up Items

Testing

Telemetry Notice

Uh oh!

nagilson commented Feb 20, 2026

Uh oh!

dsplaisted left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsplaisted commented Feb 22, 2026

Uh oh!

nagilson commented Feb 23, 2026

Uh oh!

nagilson commented Feb 23, 2026

Uh oh!

nagilson commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nagilson commented Feb 2, 2026 •

edited

Loading