Memory Format support for Resnet models

We define 4D tensor as stored in channels last memory format, when dimensions order is NCHW and ```C-strides < W-strides < H-strides < N-strides``` _(If size of any dimension is equal to 1, this dimension strides value is not taken into account)_.

Channels last contiguous tensor is channel last tensor which occupies contiguous memory block. So ```x.is_contiguous(memory_format=torch.channels_last)``` checks if tensor is channels last contiguous.

The goal of the experiment is to use channels last memory format in all Resnet (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) model's operators and to measure performance gains on the Volta devices with CudNN library available.  

This experiment requires:
1) Update operators kernels to follow the next rule: if one of the operator's inputs is channel last tensor, all outputs should also be in the channel last memory format.
2) For better performance gain, update DataLoader to output channel last tensors.

To avoid changing the model itself and more importantly, introduce this optimization to the existing saved models. We need to introduce the next changes:
- [ ] `to` operator should preserve memory format. `copy_device_to_device` should be memory format aware. (#23899)
- [ ] `empty_like` operator should preserve memory format by default. (#23899)
- [ ] `resize_as_` operator should be memory format aware. (#23899)
- [x] `clone` operator should preserve memory format. (#23899)
- [ ] `scatter` and `gather` functions should be memory format aware. (#24121)
- [ ] `TensorIterator` based point-wise operators should preserve memory format (#24038).
- [ ] `adaptive_avg_pool2d_cuda` and `adaptive_avg_pool2d_backward_cuda` should have channel last optimized kernels. (#24396)
- [x] `max_pool2d_with_indices_cuda` and `max_pool2d_with_indices_backward_cuda` should have channel last optimized kernels (#24872).
- [ ] `cudnn_batch_norm` and `cudnn_batch_norm_backward` should support channels last memory format. (#23861)
- [ ] `cudnn_convolution_forward` and `cudnn_convolution_backward` should support channels last memory format. (#23861)

Writing memory format aware operators require special functions introduced in #23391

```cpp
auto memory_format = input_tensor.suggest_memory_format();
auto output_tensor = at::empty(output_shape, memory_format);

switch (memory_format) {
  case MemoryFormat::ChannelsLast: {
    input_cl_contiguous = input_tensor.contiguous(
        MemoryFormat::ChannelsLast); // if kernel requires memory contiguous
                                     // tensor
    // .... kernel code
    break;
  }
  case MemoryFormat::Contiguous: {
    // .... standard kernel
    break;
  }
  default:
    TORCH_CHECK(
        false,
        "Unsupported memory format. Supports only ChannelsLast, Contiguous");
}
```

## Notes

- Resnet calls ```x = x.reshape(x.size(0), -1)``` before linear layers, we are going to update `reshape` and `view` code and convert tensor's memory format to `torch.contiguous_format` at this step.
- Making old models/libraries to work faster also requires **BC sacrifices**, such as 
`empty_like` ( and all _like operators ) will return channels last tensor if input is channels last, similar will apply to `to`, `clone`, `resize_as`. We are thinking about the ability to control suggest_memory_format behaviors by the global variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Format support for Resnet models #23403

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Format support for Resnet models #23403

Description

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions