`torch._inductor.cpu_vec_isa.pick_vec_isa` takes ~9 seconds to run

I noticed that when compiling a small microbenchmark (with inductor warm caching), E2E compile times were ~4s with cuda tensors and ~15s with cpu tensors. It looks like the majority of the extra time for cpu is spent when inductor runs `pick_vec_isa`:
```
import torch
from torch._inductor.cpu_vec_isa import pick_vec_isa
import time

start = time.time()
out = pick_vec_isa()
end = time.time()
# 9.106s on my machine
print(end - start)
```

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov @ezyang @gchanan @zou3519 @msaroufim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`torch._inductor.cpu_vec_isa.pick_vec_isa` takes ~9 seconds to run #140970

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch._inductor.cpu_vec_isa.pick_vec_isa takes ~9 seconds to run #140970

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`torch._inductor.cpu_vec_isa.pick_vec_isa` takes ~9 seconds to run #140970