[aotd] Do not force contiguous() for channels_last #135225

IvanKobzarev · 2024-09-05T13:42:14Z

Stack from ghstack (oldest at bottom):

-> [aotd] Do not force contiguous() for channels_last #135225

Original Issue: #134644

We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing.

=>
Tracing time:

Store trace_tangents_memory_formats in metadata
Coerce tangents to deduced memory_format

Runtime:

Coerce tangents to tracing memory format from metadata

Subclasses logic:

Previously coercing tangents logic did not handle nested subclasses case, fixing this.

For Subclasses we deduce memory format for subclass_tensor first, then for each element of subclass:
[subclass_tensor_memory_format, subclass_tensor_elem0_memory_format, ... ]

If subclass element (tensor_flatten[0] tensors) is also subclass => on its place we will have a nested list of the same structure.

The recursive traversal of subclass tree is expensive. So we do memory format deduction and coercing at the same time, to keep only one traverse for this. With this approach there is no regression in comparison with previous logic which also does one traversal. (coerce_tangent_and_suggest_memory_format method).

Other small change:
Remove duplicated not-related comment.

Testing

python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous

Benchmarking:
After change:

└─ $ PYTORCH_AOTD_DEBUG_PROFILE=1 python test/functorch/test_aotdispatch.py -k test_benchmark_grads_no_force_contiguous
Benchmark SUBCLASS avg_bwd_duration:4.059906005859375 ms
Benchmark NO_SUBCLASS avg_bwd_duration:3.1563830375671387 ms

Before change:

BEFORE_CHANGE SUBCLASS 4.1194

No siginificant changes in processing time.

(We do single traverse of subclass tree for collecting memory_formats and coercing during tracing.)

[ghstack-poisoned]

pytorch-bot · 2024-09-05T13:42:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135225

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2220725 with merge base ff2360c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

test/functorch/test_aotdispatch.py

bdhirsh · 2024-09-05T17:22:30Z

test/functorch/test_aotdispatch.py

+                    memory_format=torch.channels_last
+                ),
+            )
+            [inp.retain_grad() for inp in ret]


why are all of the retain_grad calls necessary throughout these tests?

Yeah, I am not doing grad assert checks here, we can remove it.

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` [ghstack-poisoned]

Chillee · 2024-09-05T20:30:51Z

test/functorch/test_aotdispatch.py

        out = torch.compile(fn, backend="aot_eager", fullgraph=True)(inp)
        self.assertEqual(ref_out, out)

+    def test_channels_last_grads_no_force_contiguous_dense(self):


I would add some more tests with some outputs that are channels last, and some outputs that aren't.

Chillee · 2024-09-05T20:31:08Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py

 # Today, we force this guess to be correct by additioanlly calling contiguous()
 # on all tangents at runtime.
 # In the future, you could imagine lifting this restriction, since these contiguous()
 # calls can have noticeable perf overhead depending on the model.


update this comment

bdhirsh · 2024-09-05T20:43:21Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py

+            suggest_memory_format = torch._prims_common.suggest_memory_format
+            if is_traceable_wrapper_subclass(t):
+                return [
+                    recursive_suggest_memory_format(getattr(t, attr))


I'm a bit torn because:

(1) right now you are trying to track the memory formats of inner tensors separately from their outer subclasses. This technically isn't enough, because you need to handle nested subclasses (what if getattr(t, attr) is another tensor subclass that you need to recursively flatten and get the memory formats of its inner tensors?)

(2) you could fix that by having some recursive nested lists that contain the full hierarchy of suggested memory formats for every subclass tangent. I'm a bit worried about traversing those nested lists being bad for hot-path in the backward.

Alternatively, we could make the simplifying assumption that if a subclass advertises as "channels-last contiguous", it will likely also have channels-last-contiguous inner tensors (and vis versa), and therefore only bother to do this bookkeeping on the top-level subclass and not on the inner tensors.

I guess it is totally possible to construct a subclass that violates that, though, and the recursive tracking shouldn't be that slow. So what do you think of:

(1) do the recursive tracking

(2) run a quick microbenchmark that times the backward of a function like this, and make sure its backward isn't a lot slower

def create_inp(x): return TwoTensor(TwoTensor(x.clone(), x.clone()), TwoTensor(x.clone(), x.clone())) @torch.compile def f(*args): return args x = torch.randn(1, requires_grad=True) inps = [create_inp(x) for _ in range(100)] outs = f(inps) # you probably just want to measure the overhead of the `CompiledFunction.backward()` directly, so you don't measure overhead from the rest of autograd sum(outs).sum().backward()

Yes, added recursive handling for subclasses attributes.
Will add test for NestedSubclasses to verify it.

Optimization of recursive calls here is interesting. The most expensive will be doing python calls, potentially we can move this DFS of subclasses into C++ fun, but need to measure the difference.

Benchmarked:

benchmark_inps = [inps_fn(torch.randn(2, 3, 5, 5, requires_grad=True).to(memory_format=torch.channels_last)) for _ in range(100)] bwd_total_duration = 0 for inps in benchmark_inps: outs = torch.compile(mc, backend="aot_eager", fullgraph=True)(*inps) s = outs[0].sum() time_start = time.time() s.backward() bwd_duration = time.time() - time_start bwd_total_duration += bwd_duration avg_bwd_duration = bwd_total_duration / len(benchmark_inps) print(f"XXX SUBCLASS_GROUP avg_bwd_duration:{avg_bwd_duration*1000} ms") class M2(torch.nn.Module): def __init__(self) -> None: super().__init__() self.conv = torch.nn.Conv2d(3, 3, 3) def forward(self, x0, x1, x2, x3): return self.conv(x0), self.conv(x1), self.conv(x2), self.conv(x3) m2 = M2() m2.to(memory_format=torch.channels_last) m2.train() def inps_fn2(x): return ( x.clone(), x.clone(), x.clone(), x.clone() ) benchmark_inps2 = [inps_fn2(torch.randn(2, 3, 5, 5, requires_grad=True).to(memory_format=torch.channels_last)) for _ in range(100)] bwd_total_duration = 0 for inps in benchmark_inps2: outs = torch.compile(m2, backend="aot_eager", fullgraph=True)(*inps) s = outs[0].sum() time_start = time.time() s.backward() bwd_duration = time.time() - time_start bwd_total_duration += bwd_duration avg_bwd_duration = bwd_total_duration / len(benchmark_inps) print(f"XXX NO_SUBCLASS_GROUP avg_bwd_duration:{avg_bwd_duration * 1000} ms")

XXX SUBCLASS_GROUP avg_bwd_duration:7.817392349243163 ms
XXX NO_SUBCLASS_GROUP avg_bwd_duration:3.940465450286865 ms
Subclasses bwd is 2 times slower avg

oh that delta between subclass and no delta is a good datapoint (that's also probably telling us that other things like the subclass flatten/unflatten we do in the subclass path adds up to quite a bit of overhead in general?)

My specific question though was more about if this PR causes any regressions for the existing subclass path. So "SUBCLASS_GROUP" (with your changes) vs. "SUBCLASS_GROUP" (without your changes)

Measured:

Without changes:
channels_last: 11.92ms

With changes:
contiguous: 10.68ms
channels_last: 8ms

No regression with changes, used all-contiguous() to remove the difference of having contiguous() without-changes. But still do not understand fully why it speeds up the backward in case of all contiguous.

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` [ghstack-poisoned]

ghstack-source-id: 418b5cd Pull Request resolved: #135225

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` [ghstack-poisoned]

bdhirsh · 2024-09-06T14:29:01Z

test/functorch/test_aotdispatch.py

+                        torch.randn(2, 3, 5, 5, requires_grad=True).to(
+                            memory_format=torch.channels_last
+                        ),
+                        torch.randn(2, 3, 5, 5, requires_grad=True).to(


even better, let's make the inner two tensors have a mix of channels-first-contiguous and channels-last-contiguous, and ensure that we still don't have any runtime contiguous calls (so we "guessed all of our memory formats" properly)

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` [ghstack-poisoned]

ghstack-source-id: f17b835 Pull Request resolved: #135225

bdhirsh · 2024-09-06T14:35:43Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py


+# Workaround of https://github.com/pytorch/pytorch/issues/62027
+# tensor.contiguous() guarantees to return non-zero sorted-stride,
+# tensor.to(memory_format=torch.contiguous_format) can keep zero strides.


hmm... can you talk more about why you think it's worth trying to handle the zero strides case?

The common case where I imagine zero strides showing up is when you compile a model (not including the loss), and then run .sum().backward(). The autograd engin will do something like torch.ones(1).expand(tangent_shape), and the incoming tangent input to the backward will have a zero stride.

We are still kind of out-of-luck in this case though, because the strides of the tangent will be different than the strides of the forward graph output (the tangent technically has overlapping memory, while the forward output does not).

We will have been forced to trace out a backward graph ahead-of-time that assumed that the tangent was a plain contiguous tensor, so it does kind of feel like we are forced to emit the contiguous call in the backward.

Is the zero-stride-handling here attempting to handle that case? Or something else

Copy from chat:

Inductor generates asserts for sizes and strides, and as output strides were non-zero - then it generates non-zero assert.
When at runtime tangents come with zero stride and .to(contiguous) does not change them - it fails on inductor assert.

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` [ghstack-poisoned]

bdhirsh · 2024-09-13T15:05:25Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py

+):
+    updated = False
+    if not isinstance(x, Tensor):
+        return x, None, updated


this fn feels a bit complicated (it returns [Tensor, MemoryFormat, bool]?) - can you add return types to the fn and document why they are needed / how they are used? (in particular it's not obvious to me why we need to return a was_updated bool)

reading this more - I think returning the was_updated bool is kind of confusing, since it is ignored in most places. What if instead, we:

don't return a was_updated bool

in the places where the caller needed it, they can just do the check themselves to know if contiguous had an effect:

updated_out, mem_format = coerce_tangent_and_suggest_memory_format(out) if updated_out is not out: setattr(...)

Originally I had updated_out is not out but this does not work, as we do out.detach() first, and the check is not will always fail and we will do the update.

I introduced was_updated to avoid additional check on memory_format, which can be painful with symbolic shapes and add some guards.

bdhirsh · 2024-09-13T15:10:14Z

torch/_functorch/_aot_autograd/input_output_analysis.py

        if keep_arg_mask[m.mutated_inp_runtime_indices[i]]
    ]
    traced_tangents = filtered_inp_traced_tangents + other_traced_tangents
+    assert m.traced_tangent_memory_formats is not None


side note (since I see the logic you're changing here is in the remove_dupe_metadata codepath): we are pretty sure that this is dead code in torch.compile, since dynamo has its own logic to remove dupe'd inputs.

@jamesjwu tried to kill this code a while back (but I think ran into some issues?) - we probably want to give it another shot at some point #127306

bdhirsh · 2024-09-13T15:10:54Z

torch/_functorch/_aot_autograd/input_output_analysis.py

+    assert m.traced_tangent_memory_formats is not None
+    traced_tangent_memory_formats = [torch.contiguous_format] * len(
+        filtered_inp_traced_tangents
+    ) + m.traced_tangent_memory_formats[num_data_mutations:]


meh yeah i guess this is good enough (not 100% accurate, but again this should be dead code)

bdhirsh · 2024-09-13T15:12:02Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py

-def coerce_tangent(x):
+# If runtime specfied tangents will not have the same memory format as predicted traced tangents,
+# we coerce them at runtime to traced tangents memory format.
+def coerce_tangent(x, memory_format=torch.contiguous_format):


hmm i don't think I understand - why do we have both coerce_tangent() and coerce_tangent_and_suggest_memory_format() floating around? It seems like your new util should subsume the old one (and we can replace call sites of the first with the second?)

I introduced a new one to not regress the coerce_tangent that is also called from input_and_mutation_aliases.
coerce_tangent_and_suggest_memory_format is doing more things calling suggest_memory_format and optional logic of force_memory_format.
So I decided to keep the fast version in parallel. (It's callsed from input_output_analysis)

Hmm, the only place I see it used in input_output_analysis.py is in create_synthetic_base_metadata, which is called at compile time anyway. So I don't think this fn is actually used anywhere at runtime?

bdhirsh · 2024-09-13T15:15:41Z

torch/_functorch/_aot_autograd/collect_metadata_analysis.py

+# Coercing and collecting traced tangents memory format in one recursive traversal
+# mypy: ignore-errors
+def coerce_tangent_and_suggest_memory_format(
+    x: Tensor, force_memory_format: Optional[torch.memory_format] = None


hmm... do we actually pass in a non-None value for force_memory_format anywhere? I don't think I see one.

Given that we have a separate coercion function for trace-time vs runtime, the "force_memory_format" seems unnecessary? (the trace time fn never forces, the runtime fn always forces)

Yes, non-None value means that we need to run suggest_memory_formt to deduce it, this happens for all output tangents. While for others - aliases, mutation input tangents - it will have a value.

bdhirsh · 2024-09-13T15:23:26Z

torch/_functorch/_aot_autograd/runtime_wrappers.py

-                ]
+                    all_args = [
+                        (
+                            AOTDispatchAutograd.coerce_runtime_tangent_tracing_memory_format(


hmm why do we need to call coerce_runtime_tangent_tracing_memory_format separately on both sides of the branch above?

For the subclasses case we want to call it before flattening. Potentially we can move coercing before subclasses branch I will try.

bdhirsh · 2024-09-13T15:24:45Z

torch/_functorch/_aot_autograd/schemas.py

+    #   if the tangent is a subclass, traced_tangent_memory_formats[i] holds a list of memory formats,
+    #     containing the expected memory format of the subclass **and** all of its inner tensors
+    traced_tangent_memory_formats: Optional[
+        List[Union[torch.memory_format, List[torch.memory_format]]]


reading the contents of coerce_tangent_and_suggest_memory_format, I think this type is actually not accurate? It looks like when you have nested layers of tensor subclasses, the inner list is not flattened. So if I have 3 layers of wrapped TwoTensor, I'll get 3 layers of nested lists.

Is that intentional? (it might make implementing the logic cleaner). If it is, then I would just kill the typing here and explain it in a comment

Yes, this typing is wrong, we have here recursive typing

TMF = Union[torch.memory_format, List[TMF]] traced_tangent_memory_formats: Optional[List[TMF]]

will try to express it with typing, or replace with Any and put the recursive typing in comment

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Subclasses logic: - Previously coercing tangents logic did not handle nested subclasses case, fixing this. For Subclasses we deduce memory format for subclass_tensor first, then for each element of subclass: [subclass_tensor_memory_format, subclass_tensor_elem0_memory_format, ... ] If subclass element (__tensor_flatten__[0] tensors) is also subclass => on its place we will have a nested list of the same structure. The recursive traversal of subclass tree is expensive. So we do memory format deduction and coercing at the same time, to keep only one traverse for this. With this approach there is no regression in comparison with previous logic which also does one traversal. (`coerce_tangent_and_suggest_memory_format` method). Other small change: Remove duplicated not-related comment. Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` Benchmarking: After change: ``` └─ $ PYTORCH_AOTD_DEBUG_PROFILE=1 python test/functorch/test_aotdispatch.py -k test_benchmark_grads_no_force_contiguous Benchmark SUBCLASS avg_bwd_duration:4.059906005859375 ms Benchmark NO_SUBCLASS avg_bwd_duration:3.1563830375671387 ms ``` Before change: ``` BEFORE_CHANGE SUBCLASS 4.1194 ``` No siginificant changes in processing time. (We do single traverse of subclass tree for collecting memory_formats and coercing during tracing.) [ghstack-poisoned]

ghstack-source-id: 561e47e Pull Request resolved: #135225

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Subclasses logic: - Previously coercing tangents logic did not handle nested subclasses case, fixing this. For Subclasses we deduce memory format for subclass_tensor first, then for each element of subclass: [subclass_tensor_memory_format, subclass_tensor_elem0_memory_format, ... ] If subclass element (__tensor_flatten__[0] tensors) is also subclass => on its place we will have a nested list of the same structure. The recursive traversal of subclass tree is expensive. So we do memory format deduction and coercing at the same time, to keep only one traverse for this. With this approach there is no regression in comparison with previous logic which also does one traversal. (`coerce_tangent_and_suggest_memory_format` method). Other small change: Remove duplicated not-related comment. Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` Benchmarking: After change: ``` └─ $ PYTORCH_AOTD_DEBUG_PROFILE=1 python test/functorch/test_aotdispatch.py -k test_benchmark_grads_no_force_contiguous Benchmark SUBCLASS avg_bwd_duration:4.059906005859375 ms Benchmark NO_SUBCLASS avg_bwd_duration:3.1563830375671387 ms ``` Before change: ``` BEFORE_CHANGE SUBCLASS 4.1194 ``` No siginificant changes in processing time. (We do single traverse of subclass tree for collecting memory_formats and coercing during tracing.) [ghstack-poisoned]

ghstack-source-id: 702c6d4 Pull Request resolved: #135225

Original Issue: #134644 We assume trace_tangents to have the same memory_format as inputs, outputs, intermediate during first tracing. => Tracing time: - Store trace_tangents_memory_formats in metadata - Coerce tangents to deduced memory_format Runtime: - Coerce tangents to tracing memory format from metadata Subclasses logic: - Previously coercing tangents logic did not handle nested subclasses case, fixing this. For Subclasses we deduce memory format for subclass_tensor first, then for each element of subclass: [subclass_tensor_memory_format, subclass_tensor_elem0_memory_format, ... ] If subclass element (__tensor_flatten__[0] tensors) is also subclass => on its place we will have a nested list of the same structure. The recursive traversal of subclass tree is expensive. So we do memory format deduction and coercing at the same time, to keep only one traverse for this. With this approach there is no regression in comparison with previous logic which also does one traversal. (`coerce_tangent_and_suggest_memory_format` method). Other small change: Remove duplicated not-related comment. Testing ``` python test/functorch/test_aotdispatch.py -k test_channels_last_grads_no_force_contiguous ``` Benchmarking: After change: ``` └─ $ PYTORCH_AOTD_DEBUG_PROFILE=1 python test/functorch/test_aotdispatch.py -k test_benchmark_grads_no_force_contiguous Benchmark SUBCLASS avg_bwd_duration:4.059906005859375 ms Benchmark NO_SUBCLASS avg_bwd_duration:3.1563830375671387 ms ``` Before change: ``` BEFORE_CHANGE SUBCLASS 4.1194 ``` No siginificant changes in processing time. (We do single traverse of subclass tree for collecting memory_formats and coercing during tracing.) [ghstack-poisoned]

ghstack-source-id: d9d176c Pull Request resolved: #135225

IvanKobzarev · 2024-09-27T10:42:58Z

@pytorchbot merge

pytorchmergebot · 2024-09-27T10:44:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-27T10:44:54Z

Merge failed

Reason: 2 jobs have failed, first few of them are: inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 1, 2, lf.linux.g5.4xlarge.nvidia.gpu), inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100)

Details for Dev Infra team

Raised by workflow job

IvanKobzarev · 2024-09-27T14:48:57Z

@pytorchbot merge

pytorchmergebot · 2024-09-27T14:50:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[aotd] Do not force contiguous() for channels_last

78e26b8

[ghstack-poisoned]

IvanKobzarev requested review from Chillee and ezyang as code owners September 5, 2024 13:42

pytorch-bot bot added the ciflow/inductor label Sep 5, 2024

Update on "[aotd] Do not force contiguous() for channels_last"

f9a2272

[ghstack-poisoned]

Update on "[aotd] Do not force contiguous() for channels_last"

cc99534

[ghstack-poisoned]

IvanKobzarev requested a review from bdhirsh September 5, 2024 16:55

bdhirsh reviewed Sep 5, 2024

View reviewed changes

test/functorch/test_aotdispatch.py Outdated Show resolved Hide resolved

bdhirsh reviewed Sep 5, 2024

View reviewed changes

IvanKobzarev added the topic: not user facing topic category label Sep 5, 2024

Chillee reviewed Sep 5, 2024

View reviewed changes

bdhirsh reviewed Sep 5, 2024

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Sep 5, 2024

[aotd] Do not force contiguous() for channels_last

6a7a76c

ghstack-source-id: 418b5cd Pull Request resolved: #135225

ezyang removed their request for review September 6, 2024 03:21

bdhirsh reviewed Sep 6, 2024

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Sep 6, 2024

[aotd] Do not force contiguous() for channels_last

467cc45

ghstack-source-id: f17b835 Pull Request resolved: #135225

bdhirsh reviewed Sep 6, 2024

View reviewed changes

bdhirsh reviewed Sep 13, 2024

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Sep 13, 2024

[aotd] Do not force contiguous() for channels_last

03607a3

ghstack-source-id: 561e47e Pull Request resolved: #135225

IvanKobzarev added a commit that referenced this pull request Sep 16, 2024

[aotd] Do not force contiguous() for channels_last

9a44144

ghstack-source-id: 702c6d4 Pull Request resolved: #135225

IvanKobzarev requested a review from bdhirsh September 17, 2024 12:28

bdhirsh approved these changes Sep 25, 2024

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Sep 26, 2024

[aotd] Do not force contiguous() for channels_last

4510d0e

ghstack-source-id: d9d176c Pull Request resolved: #135225

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 27, 2024

pytorchmergebot added the merging label Sep 27, 2024

pytorchmergebot removed the merging label Sep 27, 2024

pytorchmergebot added the merging label Sep 27, 2024

pytorchmergebot added the Merged label Sep 27, 2024

pytorchmergebot closed this in 34d788f Sep 27, 2024

pytorchmergebot removed the merging label Sep 27, 2024

IvanKobzarev mentioned this pull request Oct 14, 2024

Deduce Tangents Stride For Channels Last Tensor #134644

Closed

github-actions bot deleted the gh/IvanKobzarev/67/head branch October 28, 2024 02:08

[aotd] Do not force contiguous() for channels_last #135225

[aotd] Do not force contiguous() for channels_last #135225

Uh oh!

Conversation

IvanKobzarev commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135225

✅ No Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdhirsh Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanKobzarev commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 27, 2024

Merge failed

Uh oh!

IvanKobzarev commented Sep 5, 2024 •

edited

Loading

pytorch-bot bot commented Sep 5, 2024 •

edited

Loading

IvanKobzarev Sep 6, 2024 •

edited

Loading

bdhirsh Sep 13, 2024 •

edited

Loading