-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG" #39698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix exaone4 layer_types ZeroDivision/TypeError when sliding_window_pattern is None/"LLLG" #39698
Conversation
…g, add safe pattern handling
|
[For maintainers] Suggested jobs to run (before merge) run-slow: exaone4 |
What does this PR do?This PR fixes two issues in
Now, we safely branch on three cases for
We also remove the incorrect Fixes #39696 Why were these changes needed?
Manual test script (
|
|
This PR infers and implements the intended behavior from LG AI Research’s existing code and PR discussion for EXAONE-4.0. It may differ slightly from the original developer’s intent, so any feedback is greatly appreciated. We also verified that inference works correctly with the following script: from transformers import AutoModelForCausalLM, AutoTokenizer
# model_name = "LGAI-EXAONE/EXAONE-4.0-1.2B" # same result
model_name = "LGAI-EXAONE/EXAONE-4.0-32B"
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="bfloat16", device_map=None
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)
# choose your prompt
prompt = "너가 얼마나 대단한지 설명해 봐"
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
)
output = model.generate(
input_ids.to(model.device),
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))output: |
|
Hello, @wheeze01. Thank you for your attention and contribution! Your PR appears to align with our intentions, except for one point:
By the way, we will update the models' configuration with proper |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @lgai-exaone mentioned, the best is to align with layer_types! which should be explicit!
|
@ArthurZucker, thank you for your response! Apart from handling |
|
Hey! I am wondering what: attn_implementation="hybrid"would refer to? As currently all attention implementation in transformers support both sliding and non sliding |
|
@ArthurZucker, the current EXAONE 4.0 configuration includes this implementation at transformers/src/transformers/models/exaone4/modular_exaone4.py Lines 248 to 250 in 551a89a
|
|
Ah ! Sorry what I wanted to write is |
What does this PR do?
Fixes a crash in
Exaone4Config.__init__whensliding_window_patternisNone(EXAONE-4.0-1.2B) or a string like"LLLG"(EXAONE-4.0-32B). The original code unconditionally performed a modulo operation onsliding_window_pattern, causing either aZeroDivisionErroror aTypeError. It also removed an incorrect"sliding_window"key check that left_attn_implementationunset. Now:We branch safely on three cases for
sliding_window_pattern:Noneor0→ all layers use"full_attention".str(e.g."LLLG") → map each character (L→"sliding_attention", others →"full_attention"), repeat to cover all layers, and force the final layer to"full_attention".int(e.g.4) → everyn‑th layer is"full_attention", others"sliding_attention", final layer forced"full_attention".We remove the incorrect check for
"sliding_window"inlayer_typesand no longer force_attn_implementation="hybrid"; we let Hugging Face’s internal_check_and_adjust_attn_implementationdecide the proper backend (e.g.,"eager","sdpa","flash_attention_*").This resolves both the division/modulo crash and the risk of
_attn_implementationremainingNonedownstream.Fixes #39696
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Models: