Skip to content

Conversation

@mruberry
Copy link
Collaborator

This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it:

  • Creates a new (THC free) at::CUDAStream class and API
  • Extends the at::Context API to expose it
  • Stubs the current THCStream and THCState APIs to use it
  • Updates THC to no longer violate stream encapsulation (stream.hpp is dead)
  • Adds an ATen cpp test of the API
  • Bonus: Removes some debug spew in test_nn.py

The new API has several advantages over the old one:

(1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t.
(2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to @goldsborough for suggesting the dynamic registration technique)
(3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState
(4) The internals are completely encapsulated, unlike the historic THCStream
(5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention)

There are a couple things this PR does not do, however, which are left for future work:

  • It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream).
  • It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed.

I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API.

@ezyang
Copy link
Contributor

ezyang commented Jun 28, 2018

Test is real:

20:44:57 ======================================================================
20:44:57 ERROR: test_gpu (__main__.TestFFI)
20:44:57 ----------------------------------------------------------------------
20:44:57 Traceback (most recent call last):
20:44:57   File "test_utils.py", line 423, in test_gpu
20:44:57     verbose=False,
20:44:57   File "/opt/conda/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 189, in build
20:44:57     _build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/torch/utils/ffi/__init__.py", line 111, in _build_extension
20:44:57     outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/api.py", line 697, in compile
20:44:57     compiler_verbose=verbose, debug=debug, **kwds)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/recompiler.py", line 1520, in recompile
20:44:57     compiler_verbose, debug)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/ffiplatform.py", line 22, in compile
20:44:57     outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
20:44:57   File "/opt/conda/lib/python2.7/site-packages/cffi/ffiplatform.py", line 58, in _build
20:44:57     raise VerificationError('%s: %s' % (e.__class__.__name__, e))
20:44:57 VerificationError: CompileError: command 'gcc' failed with exit status 1

Unfortunately you need to look up the logs to see what actually happened:

20:44:56 test_gpu (__main__.TestFFI) ... In file included from /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC/THCStream.h:5:0,
20:44:56                  from /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC/THCCachingAllocator.h:9,
20:44:56                  from /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC/THC.h:7,
20:44:56                  from _gpulib.c:493:
20:44:56 /opt/conda/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/ATen/CUDAStream.h:3:19: fatal error: cstdint: No such file or directory
20:44:56 compilation terminated.
20:44:56 ERROR

You're not allowed to include C++ headers from .h headers in THC.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator Author

mruberry commented Jun 28, 2018

Yep; working on C linkage now. ROCm issue is separate.

Update: linkage issue is now resolved. I'm not familiar with ROCm but it looks like it isn't handling the creation of a cuda stream properly.

(EDIT!) Looks like ROCm DOES support streams (I thought it didn't). So now I'm really curious about suggestions for magic words to make it recognize (or work around) the cuda calls I added.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jun 29, 2018

23:22:24 /var/lib/jenkins/workspace/aten/src/ATen/cuda/detail/CUDAHooks.cpp:55:16: error: use of undeclared identifier 'hipStreamCreateWithPriority'
23:22:24   check_status(hipStreamCreateWithPriority(pStream, flags, priority));
23:22:24                ^

@ezyang
Copy link
Contributor

ezyang commented Jun 29, 2018

flagging @Jorghi12 @bddppq

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@colesbury
Copy link
Member

@mruberry I modified your PR to add an ifdef guard around cudaStreamCreateWithPriority

@mruberry
Copy link
Collaborator Author

mruberry commented Jun 29, 2018

@colesbury Thank you for the speedy assist.

(Following edited for clarity.)

Three corrections to be made (after review to avoid hammering the CI):

  • CUDAUtils.hpp is importing the THCStream header when it should just use a typedef.
  • Move and copy pointer functions should be removed, and copy needs to be made self copy safe. Also, for better error checking, move can free the src ptr (when not self moving).
  • Ensure int64_t/int consistency with rest of PyTorch (per @ezyang's comment below).

@ezyang
Copy link
Contributor

ezyang commented Jul 4, 2018

cudaGetDevice() takes an int&, not an int64_t&. I have kept int64_t in the API for consistency with the current design but would like to recommend changing it. We could even add a "gpu_index_type" or "gpu_size_type" to ATen/Context to ensure consistency and make this type easy to change.

The local convention I've seen is to use int when interacting with the CUDA APIs, but if you put it somewhere durable it turns into an int64_t. There are other good reasons to force people to use 64-bit ints (int on most systems is 32-bit even when the system is 64-bit.) The static casting is irritating, I agree, but I've mostly come to terms with it.

@mruberry
Copy link
Collaborator Author

mruberry commented Jul 4, 2018

Sounds good. I've edited the prior comment to incorporate this feedback.

void CUDAStream_free(CUDAStreamInternals*&);

CUDAStreamInternals* CUDAStream_copy(CUDAStreamInternals*);
void CUDAStream_move(CUDAStreamInternals*& src, CUDAStreamInternals*& dst);

This comment was marked as off-topic.

This comment was marked as off-topic.


struct THCRNGState; /* Random number generator state. */
typedef struct THCStream THCStream;
typedef struct CUDAStreamInternals THCStream;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@mruberry
Copy link
Collaborator Author

mruberry commented Jul 6, 2018

Thanks to the assistance from @ezyang and @colesbury and additional review from @apaszke, I think this PR is now in a happy place. The last round of changes:

  • Moves the pointer interface to at::detail
  • Makes the use of int and int64_t consistent with ATen
  • Incorporates copy and move in the CUDAStream class, correcting and simplifying the code
  • Replaces an excessive header inclusion with a forward declaration in c10d
  • Merges with Master (twice)

@mruberry
Copy link
Collaborator Author

mruberry commented Jul 6, 2018

@pytorchbot retest this please

The ROCm failure appears unrelated. However I'd like to see that build pass before merging.

@mruberry
Copy link
Collaborator Author

mruberry commented Jul 6, 2018

Retesting triggered an unrelated Windows bug:

19:57:12 Traceback (most recent call last):
19:57:12 File "setup.py", line 290, in check_pydep
19:57:12 importlib.import_module(importname)
19:57:12 File "C:\Jenkins\Miniconda3\lib\importlib_init_.py", line 126, in import_module
19:57:12 return _bootstrap._gcd_import(name[level:], package, level)
19:57:12 File "", line 994, in _gcd_import
19:57:12 File "", line 971, in _find_and_load
19:57:12 File "", line 953, in _find_and_load_unlocked
19:57:12 ModuleNotFoundError: No module named 'yaml'

@ezyang
Copy link
Contributor

ezyang commented Jul 8, 2018

@pytorchbot retest this please

1 similar comment
@yf225
Copy link
Contributor

yf225 commented Jul 8, 2018

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 9, 2018
Summary:
This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it:

+ Creates a new (THC free) at::CUDAStream class and API
+ Extends the at::Context API to expose it
+ Stubs the current THCStream and THCState APIs to use it
+ Updates THC to no longer violate stream encapsulation (stream.hpp is dead)
+ Adds an ATen cpp test of the API
+ Bonus: Removes some debug spew in test_nn.py

The new API has several advantages over the old one:

(1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t.
(2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique)
(3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState
(4) The internals are completely encapsulated, unlike the historic THCStream
(5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention)

There are a couple things this PR does not do, however, which are left for future work:

- It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream).
- It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed.

I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API.
Pull Request resolved: pytorch/pytorch#8997

Differential Revision: D8683375

Pulled By: soumith

fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it:

+ Creates a new (THC free) at::CUDAStream class and API
+ Extends the at::Context API to expose it
+ Stubs the current THCStream and THCState APIs to use it
+ Updates THC to no longer violate stream encapsulation (stream.hpp is dead)
+ Adds an ATen cpp test of the API
+ Bonus: Removes some debug spew in test_nn.py

The new API has several advantages over the old one:

(1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t.
(2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique)
(3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState
(4) The internals are completely encapsulated, unlike the historic THCStream
(5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention)

There are a couple things this PR does not do, however, which are left for future work:

- It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream).
- It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed.

I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API.
Pull Request resolved: pytorch/pytorch#8997

Differential Revision: D8683375

Pulled By: soumith

fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf
facebook-github-bot pushed a commit that referenced this pull request Jul 18, 2018
Summary:
THCStream was recently moved to ATen by mruberry: #8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.

I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

colesbury apaszke ezyang

Fixes #7800
Pull Request resolved: #9277

Differential Revision: D8865183

Pulled By: goldsborough

fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 18, 2018
Summary:
THCStream was recently moved to ATen by mruberry: pytorch/pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.

I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

colesbury apaszke ezyang

Fixes pytorch/pytorch#7800
Pull Request resolved: pytorch/pytorch#9277

Differential Revision: D8865183

Pulled By: goldsborough

fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
jramseyer pushed a commit to jramseyer/pytorch that referenced this pull request Jul 30, 2018
Summary:
THCStream was recently moved to ATen by mruberry: pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.

I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

colesbury apaszke ezyang

Fixes pytorch#7800
Pull Request resolved: pytorch#9277

Differential Revision: D8865183

Pulled By: goldsborough

fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it:

+ Creates a new (THC free) at::CUDAStream class and API
+ Extends the at::Context API to expose it
+ Stubs the current THCStream and THCState APIs to use it
+ Updates THC to no longer violate stream encapsulation (stream.hpp is dead)
+ Adds an ATen cpp test of the API
+ Bonus: Removes some debug spew in test_nn.py

The new API has several advantages over the old one:

(1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t.
(2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique)
(3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState
(4) The internals are completely encapsulated, unlike the historic THCStream
(5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention)

There are a couple things this PR does not do, however, which are left for future work:

- It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream).
- It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed.

I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API.
Pull Request resolved: pytorch#8997

Differential Revision: D8683375

Pulled By: soumith

fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
THCStream was recently moved to ATen by mruberry: pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.

I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

colesbury apaszke ezyang

Fixes pytorch#7800
Pull Request resolved: pytorch#9277

Differential Revision: D8865183

Pulled By: goldsborough

fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
@mruberry mruberry deleted the aten_streams branch March 16, 2019 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants