matmul uses too much memory in some batched cases

## 🐛 Bug

PyTorch `1.1.0a0+c3e3c5c`. The GPU times reported are on a P100.

In this case matmul uses about 12 GB of memory when it shouldn't use more than ~3 MB. (i.e. it's using 4096x more memory than necessary)

#### A
```python
x = torch.randn(4096, 4096)
y = torch.randn(192, 4096, 1)
z = torch.matmul(x, y)
```

Note that this is equivalent to the following memory efficient operation:


#### B
```
x = torch.randn(4096, 4096)
y = torch.randn(192, 4096, 1)
z = torch.bmm(x.unsqueeze(0).expand(192, *x.shape), y)
```

It's also equivalent to the following which is memory efficient and faster, but may require a copy of y and the output may be batched-column-major without some extra work:

#### C
```
x = torch.randn(4096, 4096)
y = torch.randn(192, 4096, 1)
z = torch.matmul(y.permute(0, 2, 1), x.t()).permute(0, 2, 1)
```

On GPU, **A** takes ~125 ms and uses 12 GB of memory, **B** takes ~22 ms, and **C** takes ~1 ms.

Originally https://discuss.pytorch.org/t/unexpected-huge-memory-cost-of-matmul/41642/4

See also https://github.com/pytorch/pytorch/issues/13222 which may be related

----

I believe the problem is the unnecessary contiguous call here:

https://github.com/pytorch/pytorch/blob/15b318de840de61e2e789c013e34d23819715090/aten/src/ATen/native/LinearAlgebra.cpp#L460-L461

Instead of using `contiguous()` and `view()` it may be possible to use `reshape()`. That might achieve performance of **B**.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

matmul uses too much memory in some batched cases #18862

🐛 Bug

A

B

C

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Tensor tensor1_expanded = tensor1.expand(tensor1_expand_size).contiguous().view(tensor1_bmm_view);
	Tensor tensor2_expanded = tensor2.expand(tensor2_expand_size).contiguous().view(tensor2_bmm_view);

matmul uses too much memory in some batched cases #18862

Description

🐛 Bug

A

B

C

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions