Skip to content

Commit e263dd3

Browse files
jjsjann123facebook-github-bot
authored andcommitted
Summary: Initial kernel support added for optimized NHWC tensor. TODO: currently backwards kernel spits out tensor with NHWC stride. Unfortunately autograd restores grad to contiguous (in either copy or add). This makes real perf tuning annoying to do. (since I cannot easily measure end-to-end time in my python script) My current kernel is blazing fast comparing to the original NCHW kernel in fp16, since I avoided atomicAdd. I'll finish perf tuning after we merged some future PR expanding NHWC support in the core. Pull Request resolved: #24396 Differential Revision: D18115941 Pulled By: VitalyFedyunin fbshipit-source-id: 57b4922b7bf308430ffe1406681f68629baf8834
1 parent 2020cc0 commit e263dd3

File tree

4 files changed

+505
-79
lines changed

4 files changed

+505
-79
lines changed

aten/src/ATen/native/AdaptiveAveragePooling.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,8 @@ namespace {
326326
return at::mkldnn_adaptive_avg_pool2d(input, output_size);
327327
}
328328

329-
if (!input.is_quantized() && output_size[0] == 1 && output_size[1] == 1) {
329+
// TODO: fastpath for Channels_last should be explored later;
330+
if (input.suggest_memory_format() == at::MemoryFormat::Contiguous && !input.is_quantized() && output_size[0] == 1 && output_size[1] == 1) {
330331
// in this case, adaptive pooling is just computing mean over hw
331332
// dimensions, which can be done more efficiently
332333
int64_t mean_size = input.size(-1) * input.size(-2);

0 commit comments

Comments
 (0)