Skip to content

Conversation

@EduardDurech
Copy link
Contributor

Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

  • xIELU Activation
  • QK-norm

Associated Transformers PR huggingface/transformers#39381

Co-author: @xzyaoi

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @EduardDurech, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the Apertus model, a pre-release from the Swiss AI Initiative, into the system. The Apertus model is a derivative of the Llama architecture, featuring two primary modifications: the novel xIELU activation function and QK-norm in its attention mechanism. This integration expands the supported model architectures and prepares for future advancements.

Highlights

  • Introduction of xIELU Activation Function: A new XIELU class is added to python/sglang/srt/layers/activation.py. This activation function is described as being introduced in a specific arXiv paper and includes both a Python fallback and an experimental CUDA implementation.
  • Integration of Apertus Model: A new file python/sglang/srt/models/apertus.py is added, defining the Apertus model architecture. This model incorporates the xIELU activation function in its MLP and uses QK-norm (specifically q_norm and k_norm RMSNorm layers) in its attention mechanism, as stated in the PR description.
  • Test Case Addition: A new ModelCase for "swiss-ai/Apertus-8B" is added to test/srt/models/test_generation_models.py to include the new model in testing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Apertus model, including the custom xIELU activation function. The changes look good overall, but I've found a few critical issues in the new Apertus model implementation that would cause runtime errors, such as a typo in a class name and an incorrect tuple unpacking. I've also included some suggestions to improve the xIELU implementation by using more specific exception handling and a more efficient tensor reshaping logic. Please review the comments for details.

@mickqian
Copy link
Collaborator

Could you also share the accuracy test results against vllm/transformers?

@EduardDurech
Copy link
Contributor Author

@mickqian
test_generation_models

hf_outputs.output_strs=[' a city of contrasts. The city is a mix of old and new, with a rich history and a vibrant present. London is a city of culture, with', ' to go out and play. I like to play with my friends. I like to play with my family. I like to play with my dog. I like', ' creating intelligent machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI systems are']
srt_outputs.output_strs=[' a city of contrasts. The city is a mix of old and new, with a rich history and a vibrant present. London is a city of culture, with', ' to go out and play. I like to play with my friends. I like to play with my family. I like to play with my dog. I like', ' creating intelligent machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. AI systems are']
rouge_l_scores=[1.0, 1.0, 1.0]

@EduardDurech
Copy link
Contributor Author

@mickqian can this be merged

@mickqian mickqian changed the title [MODEL] Apertus and XIELU model: support Apertus Sep 12, 2025
@zhyncs zhyncs merged commit 46d8fb1 into sgl-project:main Sep 12, 2025
5 of 50 checks passed
vermouth1992 pushed a commit to volcengine/verl that referenced this pull request Sep 13, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
wlf-darkmatter pushed a commit to wlf-darkmatter/verl that referenced this pull request Sep 13, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
VocabVictor pushed a commit to VocabVictor/verl-plus that referenced this pull request Sep 24, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
masoudhashemi pushed a commit to masoudhashemi/verl that referenced this pull request Oct 19, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
mtian8 pushed a commit to mtian8/verl that referenced this pull request Nov 1, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
NenoL2001 pushed a commit to NenoL2001/verl that referenced this pull request Nov 26, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
paolo328 added a commit to paolo328/Verl that referenced this pull request Nov 27, 2025
Pre-release of Apertus from the Swiss AI Initiative

Main modifications from Llama

- xIELU Activation
- QK-norm

Associated Transformers PR
huggingface/transformers#39381
Associated vLLM PR vllm-project/vllm#23068
Associated SGLang PR sgl-project/sglang#9774

GSM8K
<img width="430" height="262" alt="image"
src="https://github.com/user-attachments/assets/8b2d5188-834b-4a8c-828e-2d0aa2ccffed"
/>
<img width="436" height="266" alt="image"
src="https://github.com/user-attachments/assets/57241a73-3150-474a-a4fb-222e33a0de08"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants