NJT + Flex Attention

# Issues
I'm working on this support now in #136792. Here's a list of the current issues:
1. The [notebook](https://www.internalfb.com/intern/anp/view/?id=5581056) (internal only) demonstrates the usage of a `(1, 1, sum(seqlen), sum(seqlen))` block mask for NJT. Is this inefficient? I'd expect `(1, 1, max_seqlen, max_seqlen)` as an analogue to what is done for dense, but it's a bit tricky to implement the NJT adapter for this.
    * From offline discussion: if `_compile=True` is used for `create_block_mask()`, the full `(1, 1, sum(seqlen), sum(seqlen))` mask_tensor isn't materialized; this is good and recommended
    * Still some exploration to be done to see if this is the most efficient way to handle NJTs
2. #137255 adds some logic that assumes a constant seqlen. This will have to be hacked around some for NJT.
3. The notebook example builds a `seq_idx` to map an index within `sum(seqlen)` -> the associated beginning offset. It doesn't account for the fact that `Q_LEN` / `KV_LEN` are rounded up to the nearest block size multiple, so out-of-bounds access occurs if `sum(seqlen)` is not a multiple of the block size.
    * Turns out this is a real bug in `create_block_mask()`; should do the mod trick to avoid this (#137801)
5. ~~`create_block_mask(..., _compile=True)` throws `torch._dynamo.exc.Unsupported: Unexpected type in sourceless builder torch.Tensor` (investigating)~~
    * Fixed this by changing the NJT wrapper generator to close over `seq_idx` implicitly instead of explicitly.
6. The way `seq_idx` is built assumes `offsets[0] == 0`. This may not be the case for some non-standard NJT views.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NJT + Flex Attention #137711

Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NJT + Flex Attention #137711

Description

Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions