Skip to content

Implement optimized value semantics for CudaTensor #19

@mratsim

Description

@mratsim

Currently CudaTensor data is shallow-copied by default. From a consistency point of view it would be best if both Tensor and CudaTensor have the same behaviour.

Unfortunately, while waiting for nim-lang/Nim#6348 even constructing a CudaTensor will create an unecessary GPU copy.

Implement value semantics

proc `=`*[T](dest: var CudaTensor[T]; src: CudaTensor[T]) =
  ## Overloading the assignment operator
  ## It will have value semantics by default
  new(dest.data_ref, deallocCuda)
  dest.shape = src.shape
  dest.strides = src.strides
  dest.offset = src.offset
  dest.len = src.len
  dest.data_ref[] = cudaMalloc[T](dest.len)
  let size = dest.len * sizeof(T)
  check cudaMemCpy(dest.get_data_ptr,
                   src.get_data_ptr,
                   size,
                   cudaMemcpyDeviceToDevice)
  echo "Value copied"

Move optimization

proc `=`*[T](dest: var CudaTensor[T]; src: CudaTensor[T]{call}) {.inline.}=
  ## Overloading the assignment operator
  ## Optimized version that knows that
  ## the source CudaTensor is unique and thus don't need to be copied
  system.`=`(result, t)
  echo "Value moved"

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions