bhyravajjula.bsky.social
@bhyravajjula.bsky.social
Applied Scientist @Outreach / ex {UW, IIIT-H}
Poet @HugoHouse

I care about how AI can help readers and writers.
Oh, dear. What are you drinking there, Alex? Is it Sanjeev-adjacent?
November 13, 2025 at 9:13 PM
Would be happy to volunteer! I've reached out via email, as well.
November 12, 2025 at 10:36 PM
Would be happy to volunteer! Followed-up via email.
November 12, 2025 at 10:36 PM
I have been in the "let's keep looking for more ideas before I tweak my outdated-by-3-years website" for weeks now, thank you for putting this together!
November 4, 2025 at 6:51 PM
If you're attending #EMNLP2025, we'll be presenting virtually in Gather Session 1 on Nov 5 at 4pm PT. Come say hello!

w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social

Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com
October 31, 2025 at 3:36 PM
We release:

1) a dataset of 2.8k public domain poems (source: Poetry Foundation) with preserved whitespace: github.com/darthbhyrava/wisp

2) an interactive public dashboard to visualize distribution of whitespace across 19.4k poems from 4.3k poets! poetry.darthbhyrava.com (WIP)

#EMNLP2025
🧵👇
October 31, 2025 at 3:28 PM
Poems come from different sources - published work, Reddit, generated from LLMs (why?) - often with names like Haiku/Sonnet.

How does whitespace usage vary across sources - especially when explicitly mentioned-in/excluded-from prompts for LLMs? Does it matter for pretraining LLMs?

#EMNLP2025
🧵👇
October 31, 2025 at 3:17 PM
How does whitespace usage in poems vary across poetic form? Or across topics of the poems? Or across time?

More in our paper! arxiv.org/abs/2510.16713
#EMNLP2025
🧵👇
October 31, 2025 at 3:16 PM
We propose:

1) WISP, a practical typology of whitespace categories found in visually structured text, and

2) WISP-Bench, a benchmark for evaluating whitespace fidelity across linearization methods. On a small set of poems, we find that HTML->text slightly outperforms MLM OCR!

#EMNLP2025
🧵👇
October 31, 2025 at 3:14 PM
A poem travels a long way to meet its reader - with its whitespace going through multiple distortions as the text moves from thought to printed page to image to HTML to a dozen different formats.

Which one of these versions of CA Conrad's Mars.1 you think is true to the original?

#EMNLP2025
🧵👇
October 31, 2025 at 3:11 PM
A poem travels a long way from thought to page - with its whitespace going through multiple distortions as the poem moves from printed page to image to HTML to PDF to another dozen different formats.

Which one of these versions of CA Conrad's Mars.1 you think is true to the original?

#EMNLP
🧵👇
October 31, 2025 at 2:29 PM