Skip to content

Conversation

@Abatom
Copy link
Contributor

@Abatom Abatom commented Oct 28, 2024

Fixes #113564

When I used PyTorch's profiler to analyze the performance of vLLM, I encountered the following error. This error is similar to #113564. After analysis and troubleshooting, I changed the temporary file from text mode to binary mode, and it no longer reported an error and ran normally.

ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 722, in stop 
ERROR 10-28 10:25:50 engine.py:160]     self._transit_action(self.current_action, None) 
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 751, in _transit_action 
ERROR 10-28 10:25:50 engine.py:160]     action() 
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 745, in _trace_ready 
ERROR 10-28 10:25:50 engine.py:160]     self.on_trace_ready(self) 
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 444, in handler_fn 
ERROR 10-28 10:25:50 engine.py:160]     prof.export_chrome_trace(os.path.join(dir_name, file_name)) 
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 220, in export_chrome_trace 
ERROR 10-28 10:25:50 engine.py:160]     fout.writelines(fin) 
ERROR 10-28 10:25:50 engine.py:160]   File "<frozen codecs>", line 322, in decode 
ERROR 10-28 10:25:50 engine.py:160] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 5896: invalid start byte

@Abatom Abatom requested a review from sraikund16 as a code owner October 28, 2024 12:27
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139062

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d8432de with merge base b9618c9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 28, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 28, 2024
@ezyang
Copy link
Contributor

ezyang commented Oct 28, 2024

I'd like to accept this but I can't tell what the old code was trying to do

@Abatom
Copy link
Contributor Author

Abatom commented Oct 28, 2024

I'd like to accept this but I can't tell what the old code was trying to do

Thank you for your reply. What do I need to do?

@sraikund16
Copy link
Contributor

I'd like to accept this but I can't tell what the old code was trying to do

This function is called to create the JSON for perfetto/chrome tracer. If it sees that the user is trying to export to a .gz file, it creates a temporary file, exports the data to it using the autograd export_chrome_trace function, and then does compressed writes to the desired .gz file.

I think in this case it makes the most sense to write in binary mode instead of text

@Abatom Abatom requested a review from sraikund16 October 29, 2024 01:38
@ezyang ezyang added release notes: profiler release notes category topic: bug fixes topic category labels Oct 29, 2024
@ezyang
Copy link
Contributor

ezyang commented Oct 29, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 29, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@Abatom Abatom requested a review from ezyang October 29, 2024 05:46
@ezyang
Copy link
Contributor

ezyang commented Oct 29, 2024

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased profiler onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout profiler && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

rahulsingh-intel pushed a commit to rahulsingh-intel/pytorch that referenced this pull request Nov 5, 2024
…#139062)

Fixes pytorch#113564

When I used PyTorch's profiler to analyze the performance of vLLM, I encountered the following error. This error is similar to pytorch#113564. After analysis and troubleshooting, I changed the temporary file from text mode to binary mode, and it no longer reported an error and ran normally.

```bash
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 722, in stop
ERROR 10-28 10:25:50 engine.py:160]     self._transit_action(self.current_action, None)
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 751, in _transit_action
ERROR 10-28 10:25:50 engine.py:160]     action()
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 745, in _trace_ready
ERROR 10-28 10:25:50 engine.py:160]     self.on_trace_ready(self)
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 444, in handler_fn
ERROR 10-28 10:25:50 engine.py:160]     prof.export_chrome_trace(os.path.join(dir_name, file_name))
ERROR 10-28 10:25:50 engine.py:160]   File "/usr/local/lib/python3.12/dist-packages/torch/profiler/profiler.py", line 220, in export_chrome_trace
ERROR 10-28 10:25:50 engine.py:160]     fout.writelines(fin)
ERROR 10-28 10:25:50 engine.py:160]   File "<frozen codecs>", line 322, in decode
ERROR 10-28 10:25:50 engine.py:160] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 5896: invalid start byte
```
Pull Request resolved: pytorch#139062
Approved by: https://github.com/ezyang
@tomasruizt
Copy link

thanks for fixing this bug! @Abatom. Do you know when it will be released?

@Abatom
Copy link
Contributor Author

Abatom commented Jan 23, 2025

thanks for fixing this bug! @Abatom. Do you know when it will be released?

sorry I don't know

@hchau630
Copy link

Also encountering this bug, will this fix be included in a release soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: profiler release notes category topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Torch Profiler with_stack=True, 'utf-8' codec can't decode

7 participants