@typedfemale
Second thought: oh cool, they're already using FlexAttention!
it's a nice usage of the `or_masks` and `and_masks` API - I think they do (causal & sliding_window) | (register_mask)
Second thought: oh cool, they're already using FlexAttention!
it's a nice usage of the `or_masks` and `and_masks` API - I think they do (causal & sliding_window) | (register_mask)