-
Notifications
You must be signed in to change notification settings - Fork 682
support qk norm for append attn #3145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
| metadata.kv_signal_data_list[layer.layer_id], | ||
| getattr(layer, "q_norm_weight", None), | ||
| getattr(layer, "k_norm_weight", None), | ||
| getattr(layer, "rms_norm_eps"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不用设置默认值吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| const uint32_t elem_nums = | ||
| use_neox_style ? bsz * (num_heads + 2 * kv_num_heads) * dim_head / 2 | ||
| : bsz * (num_heads + 2 * kv_num_heads) * dim_head; | ||
| constexpr int HEAD_DIM = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是需要加判断:不支持dim_head不等于128的
fastdeploy/utils.py
Outdated
| instance_key = (cls, frozenset(kwargs.items())) | ||
| if instance_key not in instances: | ||
| instances[instance_key] = cls(*args, **kwargs) | ||
| return instances[instance_key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的修改不属于q k norm的范畴吧,而且目前ep engine的实现不需要修改这个
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的修改不属于q k norm的范畴吧,而且目前ep engine的实现不需要修改这个
Done
b09f64e to
ee996e3
Compare
|
在attention layer那里添加一下注释说明下use_qk_norm做的是qk_norm after rope,开源社区的其他模型看起来都是qk_norm before rope,这个diff需要显式指出来 |
37b6618 to
ab37724
Compare
done |
在append attn 中支持在rope后对q、k进行rms norm的计算