Raphael Schumann
schumann.bsky.social
Raphael Schumann
@schumann.bsky.social
Natural Language Processing PhD Student @ Heidelberg University.

https://schumann.pub

#NLP #NLProc #ML #AI
Same boat as your AC
March 2, 2025 at 11:13 AM
Could you add me please?
January 14, 2025 at 6:31 PM
CBOW vs. Skip-gram
December 20, 2024 at 11:59 AM
Great work! Are you going to release the models?
December 14, 2024 at 11:16 AM
This helped a lot!
November 7, 2024 at 9:27 PM
I make sure to even delete paths with my username from code in supplementary material
January 5, 2024 at 3:49 PM
It also works with Flash Attention 2, although I don't see additional speedups. I don't think FA is optimized for generation.
October 13, 2023 at 11:35 AM
Conceptually it is clear that this works but I wasn't aware that huggingface passes this through correctly.
Github Gist to reproduce:
gist.github.com/raphael-sch/...
Using padding and prefill during inference in huggingface transformers
Using padding and prefill during inference in huggingface transformers - run_padding_prefill.py
gist.github.com
October 13, 2023 at 11:35 AM
You have to place the padding tokens in between the prefill and input tokens (example with 3 prefilled tokens):
input_ids: [0, 0, X, X, X, X]
position_ids: [0, 0, 3, 4, 5, 6]
attn_mask: [1, 1, 1, 0, 0, 1, 1, 1, 1]
October 13, 2023 at 11:35 AM