anchu-cv.bsky.social
@anchu-cv.bsky.social
Reposted
This is the best explanation I've seen yet for _why_ language models prefer em-dashes (—). In brief: The tokenization scheme mean em-dashes result in a smaller loss than other, equivalent punctuation options.
msukhareva.substack.com/p/the-myster...
The mystery of em‑dashes: part two with quantitative evidence
A couple of weeks ago I made an assumption: the rise of em‑dashes in AI‑generated text happened because model providers started scanning older, pre‑Kindle books.
msukhareva.substack.com
July 7, 2025 at 5:13 AM
Reposted
This is a nice and clear "overview of the state of RAG"

hamel.dev/notes/llm/ra...

(via @arnicas.bsky.social's wonderful newsletter)
P1: I don’t use RAG, I just retrieve documents – Hamel’s Blog
Ben Clavié’s introduction to advanced retrieval techniques
hamel.dev
July 3, 2025 at 6:17 AM