Poet @HugoHouse
I care about how AI can help readers and writers.
We did! Our findings on whitespace - how to measure/preserve it, how usage varies across form/time, how it affects LLMs - now in an #EMNLP2025 (main) paper: arxiv.org/abs/2510.16713
🧵👇
www.dbreunig.com/2025/07/31/h...
www.dbreunig.com/2025/07/31/h...
Today the words were from Pink Dust, by Ron Padgett.
Today the words were from Pink Dust, by Ron Padgett.
But in poetry, whitespace matters!
Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.
New paper ⬜️ aclanthology.org/2025.emnlp-m...
But in poetry, whitespace matters!
Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.
New paper ⬜️ aclanthology.org/2025.emnlp-m...
While we've all been worrying about tokenizers, lurking in the background has been the preprocessing *before* tokenization. Poems break standard HTML-to-text linearization systems, and we find that multimodal models aren't a solution.
While we've all been worrying about tokenizers, lurking in the background has been the preprocessing *before* tokenization. Poems break standard HTML-to-text linearization systems, and we find that multimodal models aren't a solution.
We did! Our findings on whitespace - how to measure/preserve it, how usage varies across form/time, how it affects LLMs - now in an #EMNLP2025 (main) paper: arxiv.org/abs/2510.16713
🧵👇
We did! Our findings on whitespace - how to measure/preserve it, how usage varies across form/time, how it affects LLMs - now in an #EMNLP2025 (main) paper: arxiv.org/abs/2510.16713
🧵👇
@profdownie.bsky.social, and I are excited to share our paper in @bigdatasoc.bsky.social "Who decides what is read on Goodreads?" on book review sponsorship, open access at doi.org/10.1177/2053....
@profdownie.bsky.social, and I are excited to share our paper in @bigdatasoc.bsky.social "Who decides what is read on Goodreads?" on book review sponsorship, open access at doi.org/10.1177/2053....
arstechnica.com/tech-policy/...
Their argument is apparently that that the legality of their business model should not be questioned because AI is too important.
arstechnica.com/tech-policy/...
Their argument is apparently that that the legality of their business model should not be questioned because AI is too important.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
www.authorsalliance.org/2025/06/26/m...
www.authorsalliance.org/2025/06/26/m...
Below we highlight some of the most important parts of the decision on the question of whether using books for AI training is fair use:
www.authorsalliance.org/2025/06/24/a...
Below we highlight some of the most important parts of the decision on the question of whether using books for AI training is fair use:
www.authorsalliance.org/2025/06/24/a...
www.authorsalliance.org/2025/07/26/b...
www.authorsalliance.org/2025/07/26/b...
Using it for anything beyond "re-word this headline" = failure
Using it for anything beyond "re-word this headline" = failure
poetrynorthwest.submittable.com/submit
poetrynorthwest.submittable.com/submit
buttondown.com/maiht3k/arch...
My tldr: Creative works are an expression of the creators' humanity - we need to understand, respect, and protect what that means in this age of LLMs.
buttondown.com/maiht3k/arch...
My tldr: Creative works are an expression of the creators' humanity - we need to understand, respect, and protect what that means in this age of LLMs.