Skip to content

feat: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics#1172

Merged
kazuyukitanimura merged 3 commits intoapache:mainfrom
mbutrovich:cometnativescan_metrics
Feb 14, 2025
Merged

feat: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics#1172
kazuyukitanimura merged 3 commits intoapache:mainfrom
mbutrovich:cometnativescan_metrics

Conversation

@mbutrovich
Copy link
Contributor

Still confirming if there's a unit mismatch wrt time between Spark elapsed time and native elapsed time. Once I confirm that, I'll mark this okay for review.

@mbutrovich
Copy link
Contributor Author

Screenshot 2024-12-16 at 4 51 24 PM

As best as I can tell it is recording more metrics than that, but the UI cuts it off.

@mbutrovich mbutrovich marked this pull request as ready for review December 17, 2024 00:19

override lazy val metrics: Map[String, SQLMetric] = {
CometMetricNode.baselineMetrics(sparkContext) ++
Map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some way of distinguishing between these metrics and those from the current native scan? Perhaps the display string can have a short prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Native prefix. We may want to do this for all operators.
Screenshot 2024-12-17 at 10 21 27 AM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. We need not do this for all operators. This is just so we could distinguish between metrics reported by different scan implementations.

@parthchandra
Copy link
Contributor

Is the time spent reading the footer actually zero?

@mbutrovich
Copy link
Contributor Author

If I do an explain on the native side I see values that are sub-millisecond, which I'm not sure the Spark UI shows by default:

metrics=[output_rows=1, elapsed_compute=1ns, bytes_scanned=4744, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0, page_index_rows_matched=504, page_index_rows_pruned=1496, predicate_evaluation_errors=0, pushdown_rows_matched=505, pushdown_rows_pruned=503, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=49.043µs, metadata_load_time=506.667µs, page_index_eval_time=74.751µs, row_pushdown_eval_time=293.255µs, statistics_eval_time=116.584µs, time_elapsed_opening=835.042µs, time_elapsed_processing=2.705084ms, time_elapsed_scanning_total=2.349792ms, time_elapsed_scanning_until_data=2.247333ms]

The elapsed_compute number looks bogus though.

@comphead
Copy link
Contributor

comphead commented Feb 1, 2025

is this PR still needed?

@parthchandra
Copy link
Contributor

@comphead we should complete this and (eventually) merge it.
@mbutrovich perhaps you can update this PR for merging into main?
Also, I know I suggested adding a 'Nativeprefix, but I realise now that it is redundant sinceCometNativeScan` is already fully native. Up to you if you want to revert it.

@mbutrovich mbutrovich force-pushed the cometnativescan_metrics branch from d4f359f to 2b17f70 Compare February 13, 2025 16:28
@mbutrovich mbutrovich changed the title [comet-parquet-exec] CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics fix: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics Feb 13, 2025
@mbutrovich mbutrovich changed the base branch from comet-parquet-exec to main February 13, 2025 16:28
@mbutrovich
Copy link
Contributor Author

Updated the base to main branch and reverted the "Native: " prefix on the metrics. Running TPC-H locally shortly, but this should be ready to go.

@mbutrovich mbutrovich changed the title fix: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics feat: CometNativeScan metrics from ParquetFileMetrics and FileStreamMetrics Feb 13, 2025
…emoved the redundant word at the beginning of the new metrics.
@mbutrovich
Copy link
Contributor Author

Screenshot 2025-02-13 at 12 17 11 PM

@codecov-commenter
Copy link

codecov-commenter commented Feb 13, 2025

Codecov Report

Attention: Patch coverage is 98.59155% with 1 line in your changes missing coverage. Please review.

Project coverage is 39.20%. Comparing base (f09f8af) to head (9356adb).
Report is 255 commits behind head on main.

Files with missing lines Patch % Lines
...g/apache/spark/sql/comet/CometNativeScanExec.scala 98.59% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1172       +/-   ##
=============================================
- Coverage     56.12%   39.20%   -16.92%     
- Complexity      976     2075     +1099     
=============================================
  Files           119      263      +144     
  Lines         11743    60888    +49145     
  Branches       2251    12909    +10658     
=============================================
+ Hits           6591    23873    +17282     
- Misses         4012    32530    +28518     
- Partials       1140     4485     +3345     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kazuyukitanimura kazuyukitanimura merged commit 2d8e142 into apache:main Feb 14, 2025
74 checks passed
@kazuyukitanimura
Copy link
Contributor

Merged thank you @mbutrovich @parthchandra @comphead

@mbutrovich mbutrovich deleted the cometnativescan_metrics branch February 14, 2025 21:42
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
…etrics (apache#1172)

* Sync with main.

* Spark History server sems to appead the word total at the end, so I removed the redundant word at the beginning of the new metrics.

* Fix typo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants