Skip to content

Comments

Add quality metrics for dotnetup #52792

Merged
nagilson merged 64 commits intodotnet:release/dnupfrom
nagilson:nagilson-dotnetup-telem-otl
Feb 24, 2026
Merged

Add quality metrics for dotnetup #52792
nagilson merged 64 commits intodotnet:release/dnupfrom
nagilson:nagilson-dotnetup-telem-otl

Conversation

@nagilson
Copy link
Member

@nagilson nagilson commented Feb 2, 2026

Resolves #50609
(allow #52717 to be merged first - this pr needed code from that branch so we need to merge it back in)

Add OpenTel Data for dotnetup

Summary

This PR introduces telemetry for the dotnetup CLI tool using OpenTelemetry with Azure Monitor. The telemetry provides actionable insights into installation success rates, user behavior patterns, and error analysis while maintaining PII rules.

The current connection key uses my own local app insights because we'll switch to the SDK CLI one once that work has been done by someone else in the team. That shouldn't be used in production, but since this is a dev branch we are not yet releasing, seems like that's ok. I've filed an issue to track: #52785

https://aka.ms/dotnetup-telemetry points to the release/dnup branch for the doc for now, but eventually it'd go on ms learn.
#52784

Key Features

Telemetry Infrastructure

  • OpenTelemetry + Azure Monitor: Integrated Azure.Monitor.OpenTelemetry.Exporter for telemetry collection
  • Activity-based tracing: Commands and operations tracked as OpenTelemetry activities with structured tags
  • Configurable: Telemetry can be disabled via DOTNET_CLI_TELEMETRY_OPTOUT environment variable

Error Categorization System

  • Product Errors: Bugs, crashes, server issues - count against quality metrics
  • User Errors: Invalid input, permissions, disk full, network issues - tracked separately for UX improvement
  • 17 specific error codes including VersionNotFound, ManifestFetchFailed, HashMismatch, ArchiveCorrupted, etc.
  • Errors are actually thrown with proper codes throughout the codebase (not just defined)

Success Rate Metrics

  • Success rate calculation excludes user errors to measure true product quality
  • Tracks install.result (installed vs already_installed) for accurate installation counts
  • Version comparison between latest and prior releases

User Behavior Tracking (PII-Safe)

  • install.path_source: Where install path came from (explicit, global_json, default, etc.)
  • install.path_type: Classification of path (system_programfiles, user_profile, local_appdata) - not actual paths
  • install.has_global_json: Whether project has global.json
  • install.existing_install_type: Admin/User/none for existing installations
  • sdk.request_source: How SDK version was specified (explicit, default-latest, default-globaljson)
  • sdk.requested: Sanitized version string

PII Protection (Critical)

  • VersionSanitizer: All user-provided version strings are sanitized before telemetry
    • Known safe patterns pass through (e.g., "9.0", "latest", "9.0.100-preview.1")
    • Unknown patterns replaced with "invalid"
  • No raw exception messages: SetStatus() uses error type, not ex.Message
  • No RecordException(): Full exception objects not recorded (contain paths/PII)
  • Win32 errors: Use error codes (win32_error_5) instead of messages that may contain paths
  • Install paths: Classified by type, actual paths never recorded

Azure Workbook Dashboard

  • Success rate overview with version comparison
  • Command usage metrics and trends
  • Platform/environment breakdown (OS, architecture, CI vs interactive)
  • SDK installation analytics (most installed versions, request sources)
  • Separate sections for Product Errors vs User Errors (UX opportunities)
  • Performance percentiles (P50/P90/P99)
  • Daily active users tracking

Example Data:

image image

^ note some of this data is wrong / when the code was incorrect

Files Changed

New Files

  • Telemetry/DotnetupTelemetry.cs - Main telemetry singleton
  • Telemetry/ErrorCodeMapper.cs - Exception to error info mapping with categorization
  • Telemetry/VersionSanitizer.cs - PII-safe version string sanitization
  • Telemetry/dotnetup-workbook.json - Azure Workbook dashboard definition

Modified Files

  • CommandBase.cs - Template method for command telemetry
  • SdkInstallCommand.cs - Comprehensive install behavior tracking
  • InstallerOrchestratorSingleton.cs - Returns InstallResult, proper error throwing
  • NonUpdatingProgressTarget.cs / SpectreProgressTarget.cs - Operation-level telemetry
  • DotnetInstallException.cs - Extended error codes
  • DotnetArchiveExtractor.cs, DotnetArchiveDownloader.cs, ReleaseManifest.cs - Error code usage

Follow-up Items

  • Adapt telemetry notice for logging: The telemetry notice displayed to users should also be written to logs for visibility and debugging purposes
  • Create aka.ms URL: Need to create aka.ms/dotnetup-telemetry (or similar) pointing to the telemetry documentation in the release/dnup branch of dotnet/sdk repository
  • Documentation: Add telemetry documentation explaining what data is collected and how to opt out

Testing

  • Unit tests updated for error code categorization (17 error codes tested)
  • Manual testing of telemetry data in Azure Application Insights
  • Verified PII sanitization with various user inputs

Telemetry Notice

Users will see a telemetry notice on first run. The notice should link to documentation (via aka.ms redirect) that explains:

  • What data is collected
  • How data is used (improving dotnetup reliability and UX)
  • How to opt out (DOTNET_CLI_TELEMETRY_OPTOUT=1)

we should investigate if the error mapping can be outsourced as it seems silly we need to implement this ourselves
… is wrong failures

some of the categories may be incorrect, but this is a good starting point
I also initially included the sha but I want to be able to sort by error an dont have to parse out the sha which should be mappable to /from the version. Still, I kept the outsource of the build sha to a separate file bc I liked that isolated shareable pattern.
- Add llm detection
- first run disable env var
- stderr over stdout
consolidate error logic code
@nagilson nagilson marked this pull request as ready for review February 2, 2026 22:32
Copilot AI review requested due to automatic review settings February 2, 2026 22:32
@nagilson nagilson requested a review from dsplaisted February 20, 2026 18:07
@nagilson
Copy link
Member Author

The Mac failures seem to be a network issue. I don't think this is related as its happening on other branches too.

Copy link
Member

@dsplaisted dsplaisted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it through reviewing the dotnetup project. I'll go ahead and submit my comments so far.

Remaining to review are the installation library and the tests.

@dsplaisted
Copy link
Member

@nagilson It looks like the comments that seemed to have gotten lost re-appeared when I submitted the review. So you got a bunch of more or less duplicate feedback. Sorry about that! :-)

@nagilson
Copy link
Member Author

@nagilson It looks like the comments that seemed to have gotten lost re-appeared when I submitted the review. So you got a bunch of more or less duplicate feedback. Sorry about that! :-)

That's ok, thanks 😁 Sometimes feedback can be good enough it is worth repeating

@nagilson nagilson requested a review from dsplaisted February 23, 2026 19:33
@nagilson
Copy link
Member Author

I've addressed the feedback, thanks!

@nagilson
Copy link
Member Author

Failures are in the SDK repo, w.r.t to lack of the new 'humanizer' package

avg sum query correction
split runtime and sdk versions into 2 graphs
add graph for command success rate over time
remove unuseful graph
@nagilson nagilson enabled auto-merge February 24, 2026 00:07
@nagilson nagilson disabled auto-merge February 24, 2026 00:47
@nagilson nagilson merged commit 9b24fc9 into dotnet:release/dnup Feb 24, 2026
23 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants