Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jul 2, 2018

Some details about how this was done:

  • For now, the allocators for CPU and CUDA are different (unifying
    the allocators is a bigger change to make, I'll contribute this in
    a later patch). To smooth this over, the allocator field now
    stores a void* instead of THAllocator* or THCDeviceAllocator*; to
    make this clear the field is renamed to allocatorVoidPtr.

  • Some THStorage functions which were generated per-scalar are now
    generalized, and thus moved out of the generic/ library. This way
    they can be called directly from a non-code-generated at::Storage

  • THCState is moved into a C++ header. This is actually not really
    related to this particular diff, but I'll need it soon to replace
    THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
    mention it in a C header file.)

  • THPPointer needs to be adjusted, since there is no more type refinement
    between THStorage/THCStorage for it to template match over. This
    is a little tricky, because I can't refer to THCStorage_free unless
    we actually compile with CUDA. So there's two copies of the function
    now: one for the CPU build, one for the CUDA build. If we ever split
    CUDA/non-CUDA Python builds, you will have to indirect this through some
    dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang [email protected]

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <[email protected]>
@ezyang ezyang requested a review from gchanan July 2, 2018 01:41
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

protected:
friend struct ${Type};
${THStorage} *storage;
THStorage *storage;

This comment was marked as off-topic.

{
if(storage->flag & TH_STORAGE_FREEMEM) {
storage->allocator->free(storage->allocatorContext, storage->data_ptr);
static_cast<THAllocator*>(storage->allocatorVoidPtr)->free(storage->allocatorContext, storage->data_ptr);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


THStorage* THStorage_newWithSize(at::ScalarType scalar_type, ptrdiff_t size)
{
return THStorage_newWithAllocator(scalar_type, size, &THDefaultAllocator, NULL);

This comment was marked as off-topic.

storage->flag = TH_STORAGE_REFCOUNTED | TH_STORAGE_RESIZABLE | TH_STORAGE_FREEMEM;
storage->allocatorVoidPtr = allocator;
storage->allocatorContext = allocatorContext;
storage->device = 0; // device is not meaningful on CPU

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

THStorage* THStorage_(newWithDataAndAllocator)(real* data, ptrdiff_t size,
THAllocator* allocator,
void* allocatorContext) {
THStorage *storage = static_cast<THStorage*>(THAlloc(sizeof(THStorage)));

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

THCDeviceAllocator *allocator, void *allocatorContext) {
THCStorage *storage = (THCStorage*)THAlloc(sizeof(THCStorage));
memset(storage, 0, sizeof(THCStorage));
storage->backend = at::kCUDA;

This comment was marked as off-topic.

#define TH_STORAGE_FREEMEM 4

typedef struct THCStorage THCStorage;
// Blah. I tried to do this with a typedef but it didn't work

This comment was marked as off-topic.

THCStorage_free(LIBRARY_STATE ptr);
void THPPointer<THStorage>::free() {
if (ptr) {
if (ptr->backend == at::kCPU) {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


void THStorage_(resize)(THStorage *storage, ptrdiff_t size)
{
AT_ASSERT(storage->backend == at::kCPU);

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

#include <TH/THGenerateHalfType.h>

#ifndef USE_CUDA
// NB: When USE_CUDA, the *full* implementation of

This comment was marked as off-topic.


#include "THCGeneral.h"

/* Global state to be held in the cutorch table. */

This comment was marked as off-topic.

Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! I'd prefer to make the device some ridiculously low number on CPU and to have a single definition of the storage free if those are easy to do.

ezyang added a commit to ezyang/pytorch that referenced this pull request Jul 2, 2018
Summary:
Closes pytorch#9107

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <[email protected]>
Closes pytorch#9087

Differential Revision: D8712072

fbshipit-source-id: 6f2619d0c76f71e3de4b5b73b81910bf8ba3265b
@ezyang
Copy link
Contributor Author

ezyang commented Jul 2, 2018

Due to infra shenanigans, updated PR is at #9107

@ezyang ezyang closed this Jul 2, 2018
facebook-github-bot pushed a commit that referenced this pull request Jul 3, 2018
Summary:
Closes #9107

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <[email protected]>
Closes #9087

Reviewed By: orionr

Differential Revision: D8712072

Pulled By: ezyang

fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 3, 2018
Summary:
Closes pytorch/pytorch#9107

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <[email protected]>
Closes pytorch/pytorch#9087

Reviewed By: orionr

Differential Revision: D8712072

Pulled By: ezyang

fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
Closes pytorch/pytorch#9107

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <[email protected]>
Closes pytorch/pytorch#9087

Reviewed By: orionr

Differential Revision: D8712072

Pulled By: ezyang

fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
Closes pytorch#9107

Some details about how this was done:

- For now, the allocators for CPU and CUDA are different (unifying
  the allocators is a bigger change to make, I'll contribute this in
  a later patch).  To smooth this over, the allocator field now
  stores a void* instead of THAllocator* or THCDeviceAllocator*; to
  make this clear the field is renamed to allocatorVoidPtr.

- Some THStorage functions which were generated per-scalar are now
  generalized, and thus moved out of the generic/ library.  This way
  they can be called directly from a non-code-generated at::Storage

- THCState is moved into a C++ header.  This is actually not really
  related to this particular diff, but I'll need it soon to replace
  THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
  mention it in a C header file.)

- THPPointer needs to be adjusted, since there is no more type refinement
  between THStorage/THCStorage for it to template match over.  This
  is a little tricky, because I can't refer to THCStorage_free unless
  we actually compile with CUDA.  So there's two copies of the function
  now: one for the CPU build, one for the CUDA build.  If we ever split
  CUDA/non-CUDA Python builds, you will have to indirect this through some
  dynamic dispatch.

I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.

Signed-off-by: Edward Z. Yang <[email protected]>
Closes pytorch#9087

Reviewed By: orionr

Differential Revision: D8712072

Pulled By: ezyang

fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants