Skip to content

torch._inductor.cpu_vec_isa.pick_vec_isa takes ~9 seconds to run #140970

@bdhirsh

Description

@bdhirsh

I noticed that when compiling a small microbenchmark (with inductor warm caching), E2E compile times were ~4s with cuda tensors and ~15s with cpu tensors. It looks like the majority of the extra time for cpu is spent when inductor runs pick_vec_isa:

import torch
from torch._inductor.cpu_vec_isa import pick_vec_isa
import time

start = time.time()
out = pick_vec_isa()
end = time.time()
# 9.106s on my machine
print(end - start)

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov @ezyang @gchanan @zou3519 @msaroufim

Metadata

Metadata

Assignees

Labels

module: cpuCPU specific problem (e.g., perf, algorithm)module: inductoroncall: cpu inductorCPU Inductor issues for Intel team to triageoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions