-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
See:
- xpu device is not used running pipeline(device_map="auto") huggingface/transformers#31922
- get_max_memory() returns allocated memory for XPU instead of total device memory huggingface/accelerate#2929
As of 3477ee3 XPU backend in pytorch is missing torch.xpu.mem_get_info(). This function is required to support auto dispatch modes to run large models such as LLAMA 3 on the systems with devices which don't have enough memory to fit in the model. See [1] and [2] for details. It's supported for CUDA: https://pytorch.org/docs/main/generated/torch.cuda.mem_get_info.html#torch.cuda.mem_get_info.
[1] https://huggingface.co/docs/accelerate/usage_guides/big_modeling
[2] https://huggingface.co/blog/accelerate-large-models
CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @sywangyi @yao-matrix
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @EikanWang @fengyuan14 @guangyey
Metadata
Metadata
Assignees
Labels
Type
Projects
Status