-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Open
Labels
module: cpuCPU specific problem (e.g., perf, algorithm)CPU specific problem (e.g., perf, algorithm)module: inductoroncall: cpu inductorCPU Inductor issues for Intel team to triageCPU Inductor issues for Intel team to triageoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
I noticed that when compiling a small microbenchmark (with inductor warm caching), E2E compile times were ~4s with cuda tensors and ~15s with cpu tensors. It looks like the majority of the extra time for cpu is spent when inductor runs pick_vec_isa:
import torch
from torch._inductor.cpu_vec_isa import pick_vec_isa
import time
start = time.time()
out = pick_vec_isa()
end = time.time()
# 9.106s on my machine
print(end - start)
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @aakhundov @ezyang @gchanan @zou3519 @msaroufim
Metadata
Metadata
Assignees
Labels
module: cpuCPU specific problem (e.g., perf, algorithm)CPU specific problem (e.g., perf, algorithm)module: inductoroncall: cpu inductorCPU Inductor issues for Intel team to triageCPU Inductor issues for Intel team to triageoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module