Poet @HugoHouse
I care about how AI can help readers and writers.
w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social
Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com
w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social
Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com
1) a dataset of 2.8k public domain poems (source: Poetry Foundation) with preserved whitespace: github.com/darthbhyrava/wisp
2) an interactive public dashboard to visualize distribution of whitespace across 19.4k poems from 4.3k poets! poetry.darthbhyrava.com (WIP)
#EMNLP2025
🧵👇
1) a dataset of 2.8k public domain poems (source: Poetry Foundation) with preserved whitespace: github.com/darthbhyrava/wisp
2) an interactive public dashboard to visualize distribution of whitespace across 19.4k poems from 4.3k poets! poetry.darthbhyrava.com (WIP)
#EMNLP2025
🧵👇
How does whitespace usage vary across sources - especially when explicitly mentioned-in/excluded-from prompts for LLMs? Does it matter for pretraining LLMs?
#EMNLP2025
🧵👇
How does whitespace usage vary across sources - especially when explicitly mentioned-in/excluded-from prompts for LLMs? Does it matter for pretraining LLMs?
#EMNLP2025
🧵👇
More in our paper! arxiv.org/abs/2510.16713
#EMNLP2025
🧵👇
More in our paper! arxiv.org/abs/2510.16713
#EMNLP2025
🧵👇
1) WISP, a practical typology of whitespace categories found in visually structured text, and
2) WISP-Bench, a benchmark for evaluating whitespace fidelity across linearization methods. On a small set of poems, we find that HTML->text slightly outperforms MLM OCR!
#EMNLP2025
🧵👇
1) WISP, a practical typology of whitespace categories found in visually structured text, and
2) WISP-Bench, a benchmark for evaluating whitespace fidelity across linearization methods. On a small set of poems, we find that HTML->text slightly outperforms MLM OCR!
#EMNLP2025
🧵👇
Which one of these versions of CA Conrad's Mars.1 you think is true to the original?
#EMNLP2025
🧵👇
Which one of these versions of CA Conrad's Mars.1 you think is true to the original?
#EMNLP2025
🧵👇
Which one of these versions of CA Conrad's Mars.1 you think is true to the original?
#EMNLP
🧵👇
Which one of these versions of CA Conrad's Mars.1 you think is true to the original?
#EMNLP
🧵👇