Aciddelgado/gqa local by aciddelgado · Pull Request #18375 · microsoft/onnxruntime

aciddelgado · 2023-11-09T18:30:51Z

Description

Implement preliminary version of local (sliding window) attention. Currently only supported by Flash Attention (sm >= 80, Linux). Currently only supports sliding attention with a large cached kv.

Motivation and Context

This change enables to run Mistral and other models which use sliding window attention.

onnxruntime/test/python/transformers/test_flash_attn.py

-        attention = attention.masked_fill(torch.all(causal_mask, dim=-1, keepdim=True), 0.0)
+    # Some rows might be completely masked out so we fill them with zero instead of NaN
+    if window_size[0] >= 0 or window_size[1] >= 0:
+        attention = attention.masked_fill(torch.all(local_mask, dim=-1, keepdim=True), 0.0)


yufenglee

onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc

onnxruntime/core/graph/contrib_ops/bert_defs.cc

### Description Implement preliminary version of local (sliding window) attention. Currently only supported by Flash Attention (sm >= 80, Linux). Currently only supports sliding attention with a large cached kv. ### Motivation and Context This change enables to run Mistral and other models which use sliding window attention.

aciddelgado added 30 commits October 12, 2023 13:46

squash merge

ef82d4d

add unit test and fix build

54f2526

undo work in attention_impl file

e2e1157

reduce tests and change default behavior for past-kv is nullptr

415440b

test compatibility w/ no cuda

0e84d2d

exclude from amd

9ba6963

fix test script

6573133

work on local attention flash

bd79b6d

vscode idk

fb8e386

make kernels more efficient and make present output required

a87f211

Merge branch 'main' into aciddelgado/gqa_memeff_v2

16bda28

merge main and memeff changes

08f553d

address comments

eb12522

update ContribOperators.md

6c6aead

merge main

6e540e3

Merge branch 'aciddelgado/gqa_memeff_v2' into aciddelgado/gqa_local

b3a9d0f

local working with flash not memeff

3d7f3bf

clarify input and output formats memory efficient attention

db307f3

max sequence length for memory efficient attention

e7a50ee

clang and fix test file

bc1cf0a

undo clang on unrelated files

44ca857

Merge branch 'main' into aciddelgado/gqa_memeff_v2

f08495f

check value and key inputs

660c8fd

key and value dont check for nullptr since they are required

b9c4d15

fix up test script

afe22a4

fix packedmha, clean test, merge gqa_memeff_v2 branch changes

0125678

Merge branch 'main' into aciddelgado/gqa_local

353c4f5

local working with recent changes

94b5efb

no local w memeff

ddb7a66

undo unnecessary changes

d33c69b

aciddelgado requested review from tianleiwu and yufenglee November 9, 2023 18:30

undo change symbolic_shape_infer.py

26ca6d5

github-advanced-security bot found potential problems Nov 9, 2023

View reviewed changes

aciddelgado added 2 commits November 9, 2023 14:02

fix pipeline

163a81e

docs

4326c8d

yufenglee previously approved these changes Nov 15, 2023

View reviewed changes

tianleiwu reviewed Nov 16, 2023

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_api.cc Outdated Show resolved Hide resolved

tianleiwu reviewed Nov 16, 2023

View reviewed changes

onnxruntime/core/graph/contrib_ops/bert_defs.cc Show resolved Hide resolved

clean up

23c22a3

aciddelgado dismissed yufenglee’s stale review via 23c22a3 November 16, 2023 18:53

aciddelgado added 2 commits November 16, 2023 11:30

update documentation

e97a6fd

Merge branch 'main' into aciddelgado/gqa_local

93cb019

yufenglee approved these changes Nov 16, 2023

View reviewed changes

aciddelgado merged commit adb56df into main Nov 16, 2023

aciddelgado deleted the aciddelgado/gqa_local branch November 16, 2023 23:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aciddelgado/gqa local#18375

Aciddelgado/gqa local#18375
aciddelgado merged 36 commits intomainfrom
aciddelgado/gqa_local

aciddelgado commented Nov 9, 2023

Uh oh!

Check failure

yufenglee left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aciddelgado commented Nov 9, 2023

Description

Motivation and Context

Uh oh!

Check failure

yufenglee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants