Ports Streams to ATen #8997

mruberry · 2018-06-28T19:56:27Z

This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it:

Creates a new (THC free) at::CUDAStream class and API
Extends the at::Context API to expose it
Stubs the current THCStream and THCState APIs to use it
Updates THC to no longer violate stream encapsulation (stream.hpp is dead)
Adds an ATen cpp test of the API
Bonus: Removes some debug spew in test_nn.py

The new API has several advantages over the old one:

(1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t.
(2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to @goldsborough for suggesting the dynamic registration technique)
(3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState
(4) The internals are completely encapsulated, unlike the historic THCStream
(5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention)

There are a couple things this PR does not do, however, which are left for future work:

It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream).
It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed.

I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API.

ezyang · 2018-06-28T21:32:30Z

Test is real:

20:44:57 ======================================================================
20:44:57 ERROR: test_gpu (__main__.TestFFI)
20:44:57 ----------------------------------------------------------------------
20:44:57 Traceback (most recent call last):
20:44:57   File "test_utils.py", line 423, in test_gpu
20:44:57     verbose=False,
20:44:57   File "/opt/conda/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 189, in build
20:44:57     _build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 111, in _build_extension
20:44:57     outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/api.py", line 697, in compile
20:44:57     compiler_verbose=verbose, debug=debug, **kwds)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/recompiler.py", line 1520, in recompile
20:44:57     compiler_verbose, debug)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/ffiplatform.py", line 22, in compile
20:44:57     outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/ffiplatform.py", line 58, in _build
20:44:57     raise VerificationError('%s: %s' % (e.__class__.__name__, e))
20:44:57 VerificationError: CompileError: command 'gcc' failed with exit status 1

Unfortunately you need to look up the logs to see what actually happened:

20:44:56 test_gpu (__main__.TestFFI) ... In file included from /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC/THCStream.h:5:0,
20:44:56                  from /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC/THCCachingAllocator.h:9,
20:44:56                  from /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC/THC.h:7,
20:44:56                  from _gpulib.c:493:
20:44:56 /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/ATen/CUDAStream.h:3:19: fatal error: cstdint: No such file or directory
20:44:56 compilation terminated.
20:44:56 ERROR

You're not allowed to include C++ headers from .h headers in THC.

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2018-06-28T21:34:10Z

Yep; working on C linkage now. ROCm issue is separate.

Update: linkage issue is now resolved. I'm not familiar with ROCm but it looks like it isn't handling the creation of a cuda stream properly.

(EDIT!) Looks like ROCm DOES support streams (I thought it didn't). So now I'm really curious about suggestions for magic words to make it recognize (or work around) the cuda calls I added.

…to aten_streams

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2018-06-29T03:13:52Z

23:22:24 /var/lib/jenkins/workspace/aten/src/ATen/cuda/detail/CUDAHooks.cpp:55:16: error: use of undeclared identifier 'hipStreamCreateWithPriority'
23:22:24   check_status(hipStreamCreateWithPriority(pStream, flags, priority));
23:22:24                ^

ezyang · 2018-06-29T03:14:12Z

flagging @Jorghi12 @bddppq

facebook-github-bot

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

colesbury · 2018-06-29T15:57:59Z

@mruberry I modified your PR to add an ifdef guard around cudaStreamCreateWithPriority

mruberry · 2018-06-29T16:45:58Z

@colesbury Thank you for the speedy assist.

(Following edited for clarity.)

Three corrections to be made (after review to avoid hammering the CI):

CUDAUtils.hpp is importing the THCStream header when it should just use a typedef.
Move and copy pointer functions should be removed, and copy needs to be made self copy safe. Also, for better error checking, move can free the src ptr (when not self moving).
Ensure int64_t/int consistency with rest of PyTorch (per @ezyang's comment below).

ezyang · 2018-07-04T01:22:28Z

cudaGetDevice() takes an int&, not an int64_t&. I have kept int64_t in the API for consistency with the current design but would like to recommend changing it. We could even add a "gpu_index_type" or "gpu_size_type" to ATen/Context to ensure consistency and make this type easy to change.

The local convention I've seen is to use int when interacting with the CUDA APIs, but if you put it somewhere durable it turns into an int64_t. There are other good reasons to force people to use 64-bit ints (int on most systems is 32-bit even when the system is 64-bit.) The static casting is irritating, I agree, but I've mostly come to terms with it.

mruberry · 2018-07-04T02:27:18Z

Sounds good. I've edited the prior comment to incorporate this feedback.

aten/src/ATen/CUDAStream.h

+void CUDAStream_free(CUDAStreamInternals*&);
+
+CUDAStreamInternals* CUDAStream_copy(CUDAStreamInternals*);
+void CUDAStream_move(CUDAStreamInternals*& src, CUDAStreamInternals*& dst);


aten/src/THC/THCGeneral.h.in


 struct THCRNGState;  /* Random number generator state. */
-typedef struct THCStream THCStream;
+typedef struct CUDAStreamInternals THCStream;


mruberry · 2018-07-06T19:48:57Z

Thanks to the assistance from @ezyang and @colesbury and additional review from @apaszke, I think this PR is now in a happy place. The last round of changes:

Moves the pointer interface to at::detail
Makes the use of int and int64_t consistent with ATen
Incorporates copy and move in the CUDAStream class, correcting and simplifying the code
Replaces an excessive header inclusion with a forward declaration in c10d
Merges with Master (twice)

mruberry · 2018-07-06T19:49:07Z

@pytorchbot retest this please

The ROCm failure appears unrelated. However I'd like to see that build pass before merging.

mruberry · 2018-07-06T20:36:09Z

Retesting triggered an unrelated Windows bug:

19:57:12 Traceback (most recent call last):
19:57:12 File "setup.py", line 290, in check_pydep
19:57:12 importlib.import_module(importname)
19:57:12 File "C:\Jenkins\Miniconda3\lib\importlib_init_.py", line 126, in import_module
19:57:12 return _bootstrap._gcd_import(name[level:], package, level)
19:57:12 File "", line 994, in _gcd_import
19:57:12 File "", line 971, in _find_and_load
19:57:12 File "", line 953, in _find_and_load_unlocked
19:57:12 ModuleNotFoundError: No module named 'yaml'

ezyang · 2018-07-08T04:48:15Z

@pytorchbot retest this please

yf225 · 2018-07-08T04:52:57Z

@pytorchbot retest this please

facebook-github-bot

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it: + Creates a new (THC free) at::CUDAStream class and API + Extends the at::Context API to expose it + Stubs the current THCStream and THCState APIs to use it + Updates THC to no longer violate stream encapsulation (stream.hpp is dead) + Adds an ATen cpp test of the API + Bonus: Removes some debug spew in test_nn.py The new API has several advantages over the old one: (1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t. (2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique) (3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState (4) The internals are completely encapsulated, unlike the historic THCStream (5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention) There are a couple things this PR does not do, however, which are left for future work: - It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream). - It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed. I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API. Pull Request resolved: pytorch/pytorch#8997 Differential Revision: D8683375 Pulled By: soumith fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf

Summary: THCStream was recently moved to ATen by mruberry: #8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes #7800 Pull Request resolved: #9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

Summary: THCStream was recently moved to ATen by mruberry: pytorch/pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes pytorch/pytorch#7800 Pull Request resolved: pytorch/pytorch#9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

Summary: THCStream was recently moved to ATen by mruberry: pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes pytorch#7800 Pull Request resolved: pytorch#9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

Summary: This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it: + Creates a new (THC free) at::CUDAStream class and API + Extends the at::Context API to expose it + Stubs the current THCStream and THCState APIs to use it + Updates THC to no longer violate stream encapsulation (stream.hpp is dead) + Adds an ATen cpp test of the API + Bonus: Removes some debug spew in test_nn.py The new API has several advantages over the old one: (1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t. (2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique) (3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState (4) The internals are completely encapsulated, unlike the historic THCStream (5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention) There are a couple things this PR does not do, however, which are left for future work: - It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream). - It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed. I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API. Pull Request resolved: pytorch#8997 Differential Revision: D8683375 Pulled By: soumith fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf

Summary: THCStream was recently moved to ATen by mruberry: pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes pytorch#7800 Pull Request resolved: pytorch#9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

mruberry added 6 commits June 28, 2018 00:30

Ports streams to ATen

26e3cef

Adds tests and cleans up API

c81efda

Removes unnecessary comment

cf2b624

Removes unnecessary comment

96d5097

Reverts check_status update

67cd11e

Updates to use DEFAULT_PRIORITY

d4558f5

mruberry requested review from apaszke, colesbury, ezyang, gchanan, pietern, soumith, teng-li and zdevito as code owners June 28, 2018 19:56

Merge branch 'master' into aten_streams

1a31133

facebook-github-bot reviewed Jun 28, 2018

View reviewed changes

mruberry added 2 commits June 28, 2018 15:42

Fixes C linkage

88ea5ea

Merge branch 'aten_streams' of https://github.com/mruberry/pytorch in…

4d538cc

…to aten_streams

facebook-github-bot reviewed Jun 29, 2018

View reviewed changes

HIP compatibility

618ef92

facebook-github-bot reviewed Jun 29, 2018

View reviewed changes

apaszke reviewed Jul 6, 2018

View reviewed changes

mruberry and others added 9 commits July 6, 2018 09:48

Merge branch 'master' into aten_streams

7aa8890

Finishes merge with master

0b36d6e

Updates per review

68bee5c

Updates per review

a864f25

Merge branch 'master' into aten_streams

ecba596

Updates file

24839c4

Merge with master

bcf58ea

Merge with master

e18ff60

Updates missing file

9e9fab7

soumith approved these changes Jul 8, 2018

View reviewed changes

facebook-github-bot reviewed Jul 8, 2018

View reviewed changes

facebook-github-bot closed this in d6f21fc Jul 8, 2018

goldsborough mentioned this pull request Jul 9, 2018

Add CUDAGuard to ATen #9277

Closed

mruberry deleted the aten_streams branch March 16, 2019 04:38

ezyang added open source merged labels Jun 24, 2019

Ports Streams to ATen #8997

Ports Streams to ATen #8997

Uh oh!

Conversation

mruberry commented Jun 28, 2018

Uh oh!

ezyang commented Jun 28, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 29, 2018

Uh oh!

ezyang commented Jun 29, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

colesbury commented Jun 29, 2018

Uh oh!

mruberry commented Jun 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Jul 4, 2018

Uh oh!

mruberry commented Jul 4, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

mruberry commented Jul 6, 2018

Uh oh!

mruberry commented Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruberry commented Jul 6, 2018

Uh oh!

ezyang commented Jul 8, 2018

Uh oh!

yf225 commented Jul 8, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mruberry commented Jun 28, 2018 •

edited

Loading

mruberry commented Jun 29, 2018 •

edited

Loading

mruberry commented Jul 6, 2018 •

edited

Loading