feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark #3660

eric-wang-1990 · 2025-11-01T08:36:18Z

Summary

Adds comprehensive E2E benchmark for Databricks CloudFetch to measure real-world performance with actual cluster and configurable queries.

Changes

CloudFetchRealE2EBenchmark: Real E2E benchmark against actual Databricks cluster
- Configurable via JSON file (DATABRICKS_TEST_CONFIG_FILE environment variable)
- Power BI consumption simulation with batch-size proportional delays (5ms per 10K rows)
- Peak memory tracking using Process.WorkingSet64
- Custom peak memory column in results table with console output reference
CloudFetchBenchmarkRunner: Standalone runner for CloudFetch benchmarks
- Simplified to only run real E2E benchmark
- Optimized iteration counts (1 warmup + 3 actual) for faster execution
- Hides confusing Error/StdDev columns from summary table
README.md: Documentation for running and understanding the benchmarks

Configuration

Benchmark requires DATABRICKS_TEST_CONFIG_FILE environment variable pointing to JSON config:

{
  "uri": "https://your-workspace.cloud.databricks.com/sql/1.0/warehouses/xxx",
  "token": "dapi...",
  "query": "select * from main.tpcds_sf1_delta.catalog_sales"
}

Run Command

export DATABRICKS_TEST_CONFIG_FILE=/path/to/config.json
cd csharp
dotnet run -c Release --project Benchmarks/Benchmarks.csproj --framework net8.0 CloudFetchBenchmarkRunner -- --filter "*"

Example Output

Console output during benchmark execution:

Loaded config from: /path/to/databricks-config.json
Hostname: adb-6436897454825492.12.azuredatabricks.net
HTTP Path: /sql/1.0/warehouses/2f03dd43e35e2aa0
Query: select * from main.tpcds_sf1_delta.catalog_sales
Benchmark will test CloudFetch with 5ms per 10K rows read delay

// Warmup
CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 272.97 MB
WorkloadWarmup   1: 1 op, 11566591709.00 ns, 11.5666 s/op

// Actual iterations
CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 249.11 MB
WorkloadResult   1: 1 op, 8752445353.00 ns, 8.7524 s/op

CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 261.95 MB
WorkloadResult   2: 1 op, 9794630771.00 ns, 9.7946 s/op

CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 258.39 MB
WorkloadResult   3: 1 op, 9017280271.00 ns, 9.0173 s/op

Summary table:

BenchmarkDotNet v0.15.4, macOS Sequoia 15.7.1 (24G231) [Darwin 24.6.0]
Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
.NET SDK 8.0.407
  [Host] : .NET 8.0.19 (8.0.19, 8.0.1925.36514), Arm64 RyuJIT armv8.0-a

| Method            | ReadDelayMs | Mean    | Min     | Max     | Median  | Peak Memory (MB)          | Gen0       | Gen1       | Gen2       | Allocated |
|------------------ |------------ |--------:|--------:|--------:|--------:|--------------------------:|-----------:|-----------:|-----------:|----------:|
| ExecuteLargeQuery | 5           | 9.19 s  | 8.75 s  | 9.79 s  | 9.02 s  | See previous console output | 28000.0000 | 28000.0000 | 28000.0000 |   1.78 GB |

Key Metrics:

E2E Time: 8.75-9.79 seconds (includes query execution, CloudFetch downloads, LZ4 decompression, batch consumption)
Peak Memory: 249-262 MB (tracked via Process.WorkingSet64, printed in console)
Total Allocated: 1.78 GB managed memory
GC Collections: 28K Gen0/Gen1/Gen2 collections

Test Plan

Built successfully
Verified benchmark runs with real Databricks cluster
Confirmed peak memory tracking works
Validated Power BI simulation delays are proportional to batch size
Checked results table formatting

🤖 Generated with Claude Code

Signed-off-by: Sreekanth Vadigi <[email protected]>

Add comprehensive E2E benchmark for Databricks CloudFetch with real cluster testing. Features: - Real E2E benchmark against actual Databricks cluster with configurable query via JSON config file - Power BI consumption simulation with batch-size proportional delays (5ms per 10K rows) - Peak memory tracking using Process.WorkingSet64 - Optimized iteration counts (1 warmup + 3 actual) for faster execution - Custom peak memory column in results table with console output reference - Hides confusing Error/StdDev columns from summary Configuration via DATABRICKS_TEST_CONFIG_FILE environment variable pointing to JSON with: - uri: Databricks warehouse URI - token: Access token - query: SQL query to execute Run with: export DATABRICKS_TEST_CONFIG_FILE=/path/to/config.json dotnet run -c Release --project Benchmarks/Benchmarks.csproj --framework net8.0 CloudFetchBenchmarkRunner -- --filter "*" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…ated Telemetry files - Add example console output and summary table to CloudFetch benchmark README - Remove Telemetry design documents that are not relevant to this PR 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…EBenchmark - Remove references to deleted synthetic benchmarks - Update running instructions to reflect only E2E benchmark - Clean up key metrics section to remove synthetic benchmark comparisons - Fix Gen2 and Allocated values in example output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CurtHagenlocher

Thanks! Can you please add a comment to README.md with the Apache license so it passes the checkin test? You can see an examples in the existing .md files of how to format the comment.

Add Apache license comment to CloudFetch benchmark README to pass checkin test as requested by reviewer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

sreekanth-db and others added 4 commits October 27, 2025 12:56

telemetry lld

17ae31a

Signed-off-by: Sreekanth Vadigi <[email protected]>

Add activity based design

26895e5

Update telemetry-activity-based-design.md

3f6599f

eric-wang-1990 requested a review from CurtHagenlocher as a code owner November 1, 2025 08:36

github-actions bot modified the milestone: ADBC Libraries 21 Nov 1, 2025

eric-wang-1990 and others added 2 commits November 1, 2025 01:44

eric-wang-1990 changed the title ~~feat(csharp/benchmarks): Add CloudFetch E2E performance benchmark~~ feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark Nov 1, 2025

lidavidm modified the milestones: ADBC Libraries 21, ADBC Libraries 22 Nov 3, 2025

CurtHagenlocher requested changes Nov 3, 2025

View reviewed changes

CurtHagenlocher approved these changes Nov 3, 2025

View reviewed changes

CurtHagenlocher merged commit 68a2d61 into apache:main Nov 3, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark #3660

feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark #3660

Uh oh!

eric-wang-1990 commented Nov 1, 2025 •

edited

Loading

Uh oh!

CurtHagenlocher left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark #3660

feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark #3660

Uh oh!

Conversation

eric-wang-1990 commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Configuration

Run Command

Example Output

Test Plan

Uh oh!

CurtHagenlocher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eric-wang-1990 commented Nov 1, 2025 •

edited

Loading