Skip to content

Conversation

@eric-wang-1990
Copy link
Contributor

@eric-wang-1990 eric-wang-1990 commented Nov 1, 2025

Summary

Adds comprehensive E2E benchmark for Databricks CloudFetch to measure real-world performance with actual cluster and configurable queries.

Changes

  • CloudFetchRealE2EBenchmark: Real E2E benchmark against actual Databricks cluster

    • Configurable via JSON file (DATABRICKS_TEST_CONFIG_FILE environment variable)
    • Power BI consumption simulation with batch-size proportional delays (5ms per 10K rows)
    • Peak memory tracking using Process.WorkingSet64
    • Custom peak memory column in results table with console output reference
  • CloudFetchBenchmarkRunner: Standalone runner for CloudFetch benchmarks

    • Simplified to only run real E2E benchmark
    • Optimized iteration counts (1 warmup + 3 actual) for faster execution
    • Hides confusing Error/StdDev columns from summary table
  • README.md: Documentation for running and understanding the benchmarks

Configuration

Benchmark requires DATABRICKS_TEST_CONFIG_FILE environment variable pointing to JSON config:

{
  "uri": "https://your-workspace.cloud.databricks.com/sql/1.0/warehouses/xxx",
  "token": "dapi...",
  "query": "select * from main.tpcds_sf1_delta.catalog_sales"
}

Run Command

export DATABRICKS_TEST_CONFIG_FILE=/path/to/config.json
cd csharp
dotnet run -c Release --project Benchmarks/Benchmarks.csproj --framework net8.0 CloudFetchBenchmarkRunner -- --filter "*"

Example Output

Console output during benchmark execution:

Loaded config from: /path/to/databricks-config.json
Hostname: adb-6436897454825492.12.azuredatabricks.net
HTTP Path: /sql/1.0/warehouses/2f03dd43e35e2aa0
Query: select * from main.tpcds_sf1_delta.catalog_sales
Benchmark will test CloudFetch with 5ms per 10K rows read delay

// Warmup
CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 272.97 MB
WorkloadWarmup   1: 1 op, 11566591709.00 ns, 11.5666 s/op

// Actual iterations
CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 249.11 MB
WorkloadResult   1: 1 op, 8752445353.00 ns, 8.7524 s/op

CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 261.95 MB
WorkloadResult   2: 1 op, 9794630771.00 ns, 9.7946 s/op

CloudFetch E2E [Delay=5ms/10K rows] - Peak memory: 258.39 MB
WorkloadResult   3: 1 op, 9017280271.00 ns, 9.0173 s/op

Summary table:

BenchmarkDotNet v0.15.4, macOS Sequoia 15.7.1 (24G231) [Darwin 24.6.0]
Apple M1 Max, 1 CPU, 10 logical and 10 physical cores
.NET SDK 8.0.407
  [Host] : .NET 8.0.19 (8.0.19, 8.0.1925.36514), Arm64 RyuJIT armv8.0-a

| Method            | ReadDelayMs | Mean    | Min     | Max     | Median  | Peak Memory (MB)          | Gen0       | Gen1       | Gen2       | Allocated |
|------------------ |------------ |--------:|--------:|--------:|--------:|--------------------------:|-----------:|-----------:|-----------:|----------:|
| ExecuteLargeQuery | 5           | 9.19 s  | 8.75 s  | 9.79 s  | 9.02 s  | See previous console output | 28000.0000 | 28000.0000 | 28000.0000 |   1.78 GB |

Key Metrics:

  • E2E Time: 8.75-9.79 seconds (includes query execution, CloudFetch downloads, LZ4 decompression, batch consumption)
  • Peak Memory: 249-262 MB (tracked via Process.WorkingSet64, printed in console)
  • Total Allocated: 1.78 GB managed memory
  • GC Collections: 28K Gen0/Gen1/Gen2 collections

Test Plan

  • Built successfully
  • Verified benchmark runs with real Databricks cluster
  • Confirmed peak memory tracking works
  • Validated Power BI simulation delays are proportional to batch size
  • Checked results table formatting

🤖 Generated with Claude Code

sreekanth-db and others added 4 commits October 27, 2025 12:56
Signed-off-by: Sreekanth Vadigi <[email protected]>
Add comprehensive E2E benchmark for Databricks CloudFetch with real cluster testing.

Features:
- Real E2E benchmark against actual Databricks cluster with configurable query via JSON config file
- Power BI consumption simulation with batch-size proportional delays (5ms per 10K rows)
- Peak memory tracking using Process.WorkingSet64
- Optimized iteration counts (1 warmup + 3 actual) for faster execution
- Custom peak memory column in results table with console output reference
- Hides confusing Error/StdDev columns from summary

Configuration via DATABRICKS_TEST_CONFIG_FILE environment variable pointing to JSON with:
- uri: Databricks warehouse URI
- token: Access token
- query: SQL query to execute

Run with:
  export DATABRICKS_TEST_CONFIG_FILE=/path/to/config.json
  dotnet run -c Release --project Benchmarks/Benchmarks.csproj --framework net8.0 CloudFetchBenchmarkRunner -- --filter "*"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
eric-wang-1990 and others added 2 commits November 1, 2025 01:44
…ated Telemetry files

- Add example console output and summary table to CloudFetch benchmark README
- Remove Telemetry design documents that are not relevant to this PR

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…EBenchmark

- Remove references to deleted synthetic benchmarks
- Update running instructions to reflect only E2E benchmark
- Clean up key metrics section to remove synthetic benchmark comparisons
- Fix Gen2 and Allocated values in example output

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@eric-wang-1990 eric-wang-1990 changed the title feat(csharp/benchmarks): Add CloudFetch E2E performance benchmark feat(csharp/Benchmarks): Add CloudFetch E2E performance benchmark Nov 1, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you please add a comment to README.md with the Apache license so it passes the checkin test? You can see an examples in the existing .md files of how to format the comment.

Add Apache license comment to CloudFetch benchmark README to pass
checkin test as requested by reviewer.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@CurtHagenlocher CurtHagenlocher merged commit 68a2d61 into apache:main Nov 3, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants