-
Notifications
You must be signed in to change notification settings - Fork 338
Gemma Attention #320
Copy link
Copy link
Closed
Labels
Description
Most models work well with Sliding Window attention only, but Gemma uses a mix of global attention & sliding window attention. Cactus forcefully does not allow window size above 1024, which breaks Gemma beyond that context size as it needs global attention to produce coherent text irrespective of context size. Find a work around.
Reactions are currently unavailable