padding + cast can not be compiled for all dtypes on CPU

### 🐛 Describe the bug

Following example:
```python
import torch

@torch.compile
def cast_and_pad(x):
    return torch.nn.functional.pad(x.to(torch.float32), (0, 3, 0, 0))

x=torch.ones(1, 1, 13, dtype=torch.int64)
print(cast_and_pad(x))
```

will fail(see [Colab](https://colab.research.google.com/drive/1yuPAB72N26QxmDaF9hoHKQm_7pjidCJD?usp=sharing) ) with
```
/tmp/torchinductor_root/qf/cqffzgc7mvvjhlx2uqho42flqfmxpnu4g7tu2mltyq57j7thf4jq.cpp: In lambda function:
/tmp/torchinductor_root/qf/cqffzgc7mvvjhlx2uqho42flqfmxpnu4g7tu2mltyq57j7thf4jq.cpp:16:40: error: no matching function for call to ‘masked_load(const long int*, at::vec::CPU_CAPABILITY::Vectorized<float>)’
   16 |                 auto tmp6 = masked_load(in_ptr0 + static_cast<long>(x0), to_float_mask(tmp4));
```

Because indeed `masked_load` is only implemented for floating types


Fixed by https://github.com/pytorch/pytorch/pull/122217/commits/af9e30f26131a67e131570b900f258b428eeb66b

Discovered while debugging `test_comprehensive_fft_ihfft2_cpu_int64` failures on ARM/Intel CPUs without AVX512 support

### Versions

CI

cc @ezyang @msaroufim @bdhirsh @anijain2305 @zou3519 @chauhang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

padding + cast can not be compiled for all dtypes on CPU #122606

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

padding + cast can not be compiled for all dtypes on CPU #122606

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions