Skip to content

Conversation

@sraikund16
Copy link
Contributor

@sraikund16 sraikund16 commented Dec 6, 2024

Summary: We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan: Tested using internal performance suites and found no noticeable performance change

Differential Revision: D66904879

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142271

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 40845a2 with merge base d3d1a78 (image):

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66904879

@netlify
Copy link

netlify bot commented Dec 6, 2024

Deploy Preview for chimerical-cranachan-793287 ready!

Name Link
🔨 Latest commit e52cb9b
🔍 Latest deploy log https://app.netlify.com/sites/chimerical-cranachan-793287/deploys/6753850ffcb6ef00083cd12c
😎 Deploy Preview https://deploy-preview-142271--chimerical-cranachan-793287.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024
@ngimel ngimel added the release notes: profiler release notes category label Dec 9, 2024
pytorch-bot bot pushed a commit that referenced this pull request Dec 10, 2024
Summary:

We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
    duration_ms: 21668
    number_of_events: 26542
    profiler_prepare_call_duration_us: 970
    profiler_enable_call_duration_us: 616474
    profiling_window_duration_us: 2188525
    profiler_disable_call_duration_us: 148628
    parse_kineto_call_duration_us: 1672536
    function_events_build_tree_call_duration_us: 285939


kineto_benchmark
    duration_ms: 21718
    number_of_events: 26556
    profiler_prepare_call_duration_us: 885
    profiler_enable_call_duration_us: 7037
    profiling_window_duration_us: 1772481
    profiler_disable_call_duration_us: 174122
    parse_kineto_call_duration_us: 1983683
    function_events_build_tree_call_duration_us: 333582

Differential Revision: D66904879
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66904879

@sraikund16
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #142271, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@sraikund16
Copy link
Contributor Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

Summary:

We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan:
Tested using servicelab and found no performance difference:
kineto_benchmark
    duration_ms: 21668
    number_of_events: 26542
    profiler_prepare_call_duration_us: 970
    profiler_enable_call_duration_us: 616474
    profiling_window_duration_us: 2188525
    profiler_disable_call_duration_us: 148628
    parse_kineto_call_duration_us: 1672536
    function_events_build_tree_call_duration_us: 285939


kineto_benchmark
    duration_ms: 21718
    number_of_events: 26556
    profiler_prepare_call_duration_us: 885
    profiler_enable_call_duration_us: 7037
    profiling_window_duration_us: 1772481
    profiler_disable_call_duration_us: 174122
    parse_kineto_call_duration_us: 1983683
    function_events_build_tree_call_duration_us: 333582

Differential Revision: D66904879
@pytorchmergebot
Copy link
Collaborator

Successfully rebased export-D66904879 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout export-D66904879 && git pull --rebase)

@sraikund16
Copy link
Contributor Author

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

mori360 pushed a commit to mori360/pytorch that referenced this pull request Dec 11, 2024
Summary: We already have CUDA OVERHEAD events enabled in on-demand so we should also add them to auto-trace

Test Plan: Tested using internal performance suites and found no noticeable performance change

Differential Revision: D66904879

Pull Request resolved: pytorch#142271
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged release notes: profiler release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants