-
-
Notifications
You must be signed in to change notification settings - Fork 99
Implement optimized value semantics for CudaTensor #19
Copy link
Copy link
Closed
Labels
Description
Currently CudaTensor data is shallow-copied by default. From a consistency point of view it would be best if both Tensor and CudaTensor have the same behaviour.
Unfortunately, while waiting for nim-lang/Nim#6348 even constructing a CudaTensor will create an unecessary GPU copy.
Implement value semantics
proc `=`*[T](dest: var CudaTensor[T]; src: CudaTensor[T]) =
## Overloading the assignment operator
## It will have value semantics by default
new(dest.data_ref, deallocCuda)
dest.shape = src.shape
dest.strides = src.strides
dest.offset = src.offset
dest.len = src.len
dest.data_ref[] = cudaMalloc[T](dest.len)
let size = dest.len * sizeof(T)
check cudaMemCpy(dest.get_data_ptr,
src.get_data_ptr,
size,
cudaMemcpyDeviceToDevice)
echo "Value copied"Move optimization
proc `=`*[T](dest: var CudaTensor[T]; src: CudaTensor[T]{call}) {.inline.}=
## Overloading the assignment operator
## Optimized version that knows that
## the source CudaTensor is unique and thus don't need to be copied
system.`=`(result, t)
echo "Value moved"Reactions are currently unavailable