-
Notifications
You must be signed in to change notification settings - Fork 3.8k
model: support Apertus #9774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model: support Apertus #9774
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @EduardDurech, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces the Apertus model, a pre-release from the Swiss AI Initiative, into the system. The Apertus model is a derivative of the Llama architecture, featuring two primary modifications: the novel xIELU activation function and QK-norm in its attention mechanism. This integration expands the supported model architectures and prepares for future advancements.
Highlights
- Introduction of xIELU Activation Function: A new XIELU class is added to python/sglang/srt/layers/activation.py. This activation function is described as being introduced in a specific arXiv paper and includes both a Python fallback and an experimental CUDA implementation.
- Integration of Apertus Model: A new file python/sglang/srt/models/apertus.py is added, defining the Apertus model architecture. This model incorporates the xIELU activation function in its MLP and uses QK-norm (specifically q_norm and k_norm RMSNorm layers) in its attention mechanism, as stated in the PR description.
- Test Case Addition: A new ModelCase for "swiss-ai/Apertus-8B" is added to test/srt/models/test_generation_models.py to include the new model in testing.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Apertus model, including the custom xIELU activation function. The changes look good overall, but I've found a few critical issues in the new Apertus model implementation that would cause runtime errors, such as a typo in a class name and an incorrect tuple unpacking. I've also included some suggestions to improve the xIELU implementation by using more specific exception handling and a more efficient tensor reshaping logic. Please review the comments for details.
0491a28 to
9db47a4
Compare
9db47a4 to
ed2ca82
Compare
|
Could you also share the accuracy test results against vllm/transformers? |
|
@mickqian hf_outputs.output_strs=[' a city of contrasts. The city is a mix of old and new, with a rich history and a vibrant present. London is a city of culture, with', ' to go out and play. I like to play with my friends. I like to play with my family. I like to play with my dog. I like', ' creating intelligent machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI systems are']
srt_outputs.output_strs=[' a city of contrasts. The city is a mix of old and new, with a rich history and a vibrant present. London is a city of culture, with', ' to go out and play. I like to play with my friends. I like to play with my family. I like to play with my dog. I like', ' creating intelligent machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI systems are']
rouge_l_scores=[1.0, 1.0, 1.0] |
|
@mickqian can this be merged |
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative Main modifications from Llama - xIELU Activation - QK-norm Associated Transformers PR huggingface/transformers#39381 Associated vLLM PR vllm-project/vllm#23068 Associated SGLang PR sgl-project/sglang#9774 GSM8K <img width="430" height="262" alt="image" src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed" /> <img width="436" height="266" alt="image" src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08" />
Pre-release of Apertus from the Swiss AI Initiative
Main modifications from Llama
Associated Transformers PR huggingface/transformers#39381
Co-author: @xzyaoi