Skip to content

Conversation

@jstjohn
Copy link
Contributor

@jstjohn jstjohn commented May 15, 2022

Changes to AlibiPositionalBias

Alibi weights before change were only appropriate if an upper triangular mask is applied on the qk dot product:
image

Alibi weights after change are now appropriate for bidirectional attention without a mask, and should be equivalent with a mask:
image

Changes to LearnedAlibiPositionalBias

Alibi weights before change were not maximal on the diagonal if the full attention matrix in unmasked form was presented, so they were only usable with an upper triangular mask:
image
After the change Alibi weights in the learned module match the base module

@lucidrains
Copy link
Owner

@jstjohn ohh yea, this makes sense, but i do account for it https://github.com/lucidrains/x-transformers/blob/main/x_transformers/x_transformers.py#L797 already

the main reason i kept the base Alibi the way it is, is because Ofir's original code was like that

@jstjohn
Copy link
Contributor Author

jstjohn commented May 15, 2022 via email

@jstjohn
Copy link
Contributor Author

jstjohn commented May 15, 2022

But then again since bidirectional and non bidirectional should be equivalent when implemented and masked, it seems easiest to drop that option and just implement it in the general way?

@lucidrains
Copy link
Owner

@jstjohn yes that is true, lets go for your way, thank you for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants