xpu: implement torch.xpu.mem_get_info() to support huggingface auto dispatch modes

See:
* https://github.com/huggingface/transformers/issues/31922
* https://github.com/huggingface/accelerate/issues/2929

As of https://github.com/pytorch/pytorch/commit/3477ee38e4dd1429ecfd7e6f20a30cce0f4f78e7 XPU backend in pytorch is missing `torch.xpu.mem_get_info()`. This function is required to support auto dispatch modes to run large models such as LLAMA 3 on the systems with devices which don't have enough memory to fit in the model. See [1] and [2] for details. It's supported for CUDA: https://pytorch.org/docs/main/generated/torch.cuda.mem_get_info.html#torch.cuda.mem_get_info.

[1] https://huggingface.co/docs/accelerate/usage_guides/big_modeling
[2] https://huggingface.co/blog/accelerate-large-models

CC: @gujinghui @EikanWang @fengyuan14 @guangyey @jgong5 @sywangyi @yao-matrix

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @EikanWang @fengyuan14 @guangyey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

xpu: implement torch.xpu.mem_get_info() to support huggingface auto dispatch modes #130599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xpu: implement torch.xpu.mem_get_info() to support huggingface auto dispatch modes #130599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions