zawor.bsky.social
@zawor.bsky.social
long long time ago... in times of writing my thesis, shit ton of tokenizers and taggers were prepared to work with xml/html input streams in mind... for instance beloved by almost everyone, wikipedia dumps... and my bet is that we observe here fallout of that
April 22, 2025 at 8:07 AM