Implement optimized value semantics for CudaTensor

Currently CudaTensor data is shallow-copied by default. From a consistency point of view it would be best if both Tensor and CudaTensor have the same behaviour.

Unfortunately, while waiting for https://github.com/nim-lang/Nim/issues/6348 even constructing a CudaTensor will create an unecessary GPU copy.

### Implement value semantics
```Nim
proc `=`*[T](dest: var CudaTensor[T]; src: CudaTensor[T]) =
  ## Overloading the assignment operator
  ## It will have value semantics by default
  new(dest.data_ref, deallocCuda)
  dest.shape = src.shape
  dest.strides = src.strides
  dest.offset = src.offset
  dest.len = src.len
  dest.data_ref[] = cudaMalloc[T](dest.len)
  let size = dest.len * sizeof(T)
  check cudaMemCpy(dest.get_data_ptr,
                   src.get_data_ptr,
                   size,
                   cudaMemcpyDeviceToDevice)
  echo "Value copied"
```

### Move optimization
```Nim
proc `=`*[T](dest: var CudaTensor[T]; src: CudaTensor[T]{call}) {.inline.}=
  ## Overloading the assignment operator
  ## Optimized version that knows that
  ## the source CudaTensor is unique and thus don't need to be copied
  system.`=`(result, t)
  echo "Value moved"
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement optimized value semantics for CudaTensor #19

Implement value semantics

Move optimization

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Implement optimized value semantics for CudaTensor #19

Description

Implement value semantics

Move optimization

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions