Horace He
chhillee.bsky.social
Horace He
@chhillee.bsky.social
@PyTorch "My learning style is Horace twitter threads" -
@typedfemale
First thought: Seems kinda "FlexAttention-y": https://bsky.app/profile/sungkim.bsky.social/post/3lbjbfmyqts27

Second thought: oh cool, they're already using FlexAttention!

it's a nice usage of the `or_masks` and `and_masks` API - I think they do (causal & sliding_window) | (register_mask)
November 23, 2024 at 1:55 AM