Skip to content

Conversation

@ckl117
Copy link
Collaborator

@ckl117 ckl117 commented Sep 5, 2025

Support load pytorch model zai-org/GLM-4.5-Air model and inference.

model_path=./torch_models/GLM-4.5-Air

python -m fastdeploy.entrypoints.openai.api_server \
    --model ${model_path} \
    --max-model-len 32768 \
    --max-num-seqs 18 \
    --tensor-parallel-size 4 \
    --port 8112 \
    --graph-optimization-config '{"use_cudagraph":true, "graph_opt_level":0}' \
    --load_choices "default_v1" \

@paddle-bot
Copy link

paddle-bot bot commented Sep 5, 2025

Thanks for your contribution!

@ckl117 ckl117 changed the title [Feature] Support zai-org/GLM-4.5-Air model [Feature] Support zai-org/GLM-4.5-Air BF16 model Sep 8, 2025
Copy link
Collaborator

@YuanRisheng YuanRisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要补充单测

return rot_emb


class GlmRotaryEmbedding:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能想办法把ErnieRotaryEmbedding改成一个基类,这里继承一下吗,重复代码太多了

@ckl117 ckl117 merged commit 637d96c into PaddlePaddle:develop Sep 10, 2025
15 of 17 checks passed
ckl117 added a commit to ckl117/FastDeploy that referenced this pull request Sep 10, 2025
self.model.clear_grpah_opt_backend(fd_config=self.fd_config)


class Glm4MoePretrainedModel(PretrainedModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个类可以删掉了

ckl117 added a commit to ckl117/FastDeploy that referenced this pull request Sep 11, 2025
@ckl117 ckl117 mentioned this pull request Sep 12, 2025
qingqing01 pushed a commit that referenced this pull request Sep 15, 2025
* [Feature] Support zai-org/GLM-4.5-Air BF16 model (#3928)

* support glm45_air

* [Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint8 triton_moe_backend) (#4051)

* check

* fix v1 load for mix and wint8

* check --quantizations 'None'

* check

* support RL rollout

* check v1 loader

* check glm rollout_model, change wfp8afp8 per_token_cast_to_fp8 to native impl

* check rollout moe gate begin layer_id

* check rollout e_score_correction_bias

* delete infer_to_train_mapping={}

* code check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants