bhyravajjula.bsky.social
@bhyravajjula.bsky.social
Applied Scientist @Outreach / ex {UW, IIIT-H}
Poet @HugoHouse

I care about how AI can help readers and writers.
Pinned
When you read a poem, do you wonder how the poet structures it through whitespace between/before words and lines?

We did! Our findings on whitespace - how to measure/preserve it, how usage varies across form/time, how it affects LLMs - now in an #EMNLP2025 (main) paper: arxiv.org/abs/2510.16713
🧵👇
Reposted
Some interesting stuff here on measuring writing quality and improving on qualitative tasks:
www.dbreunig.com/2025/07/31/h...
November 10, 2025 at 3:11 AM
I've never regretted reading poems while in the air. You get to look out the window while the words linger.

Today the words were from Pink Dust, by Ron Padgett.
November 5, 2025 at 6:40 PM
Reposted
Computationally, whitespace gets little attention—it’s usually standardized or stripped.

But in poetry, whitespace matters!

Yet actually *preserving* that poetic whitespace is v tough. Its slipperiness points to bigger issues w/ text processing & LLMs.

New paper ⬜️ aclanthology.org/2025.emnlp-m...
November 3, 2025 at 3:14 PM
Reposted
The project that started my whitespace obsession... #EMNLP2025

While we've all been worrying about tokenizers, lurking in the background has been the preprocessing *before* tokenization. Poems break standard HTML-to-text linearization systems, and we find that multimodal models aren't a solution.
October 31, 2025 at 4:53 PM
When you read a poem, do you wonder how the poet structures it through whitespace between/before words and lines?

We did! Our findings on whitespace - how to measure/preserve it, how usage varies across form/time, how it affects LLMs - now in an #EMNLP2025 (main) paper: arxiv.org/abs/2510.16713
🧵👇
October 31, 2025 at 3:08 PM
Reposted
“We may be left with an unintended consequence… that the very companies that have been most inconsiderate about using materials crafted by writers will be strengthened by a regime in which a billion-dollar check is the entry fee for developing the best AI.”
September 5, 2025 at 10:51 PM
Reposted
NEW: Anthropic has reached a preliminary settlement in a class action lawsuit brought by a group of prominent authors, marking a major turn in of the most significant ongoing AI copyright lawsuits in history.
Anthropic Settles High-Profile AI Copyright Lawsuit Brought By Book Authors
Anthropic faced the prospect of more than $1 trillion in damages, a sum that could have threatened the company’s survival if the case went to trial.
www.wired.com
August 26, 2025 at 7:33 PM
Reposted
My co-authors, Jana Diesner, @tedunderwood.me, @zoeleblanc.bsky.social, @gworthey.bsky.social and
@profdownie.bsky.social, and I are excited to share our paper in @bigdatasoc.bsky.social "Who decides what is read on Goodreads?" on book review sponsorship, open access at doi.org/10.1177/2053....
August 18, 2025 at 7:55 PM
Reposted
Anthropic appeals ruling that would allow the copyright case against them to become a class action suit (with up to 7 million claimants)

arstechnica.com/tech-policy/...

Their argument is apparently that that the legality of their business model should not be questioned because AI is too important.
AI industry horrified to face largest copyright class action ever certified
Copyright class actions could financially ruin AI industry, trade groups say.
arstechnica.com
August 10, 2025 at 2:08 AM
Reposted
The complete lack of regard for the Y-axis hurts me physically #GPT5
data visualization is my passion
August 7, 2025 at 5:32 PM
Reposted
I must not fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
August 2, 2025 at 4:35 PM
Reposted
Judge Chhabria's “market dilution” theory suggests “using copyrighted books to train an LLM might harm the market...because it enables the rapid generation of countless works that compete with the originals, even if those works aren’t themselves infringing.”

www.authorsalliance.org/2025/06/26/m...
Meta Wins on Fair Use for Now, but Court Leaves Door Open for “Market Dilution”
“Market dilution” suggests that “using copyrighted books to train an LLM might harm the market for those works because it enables the rapid generation of countless works that compete with the origi…
www.authorsalliance.org
June 26, 2025 at 12:50 PM
Reposted
Judge Alsup's decision found that Anthropic's training AI on lawfully acquired copyrighted works was a fair use.

Below we highlight some of the most important parts of the decision on the question of whether using books for AI training is fair use:

www.authorsalliance.org/2025/06/24/a...
Anthropic wins on fair use for training its LLMs; loses on building a “central library” of pirated books
Yesterday, Judge Alsup released his decision on Anthropic’s motion for summary judgment in the fast-moving lawsuit it is defending, brought by three book authors on behalf of a class of millions ob…
www.authorsalliance.org
June 24, 2025 at 4:29 PM
Reposted
Additional analysis of the Bartz v. Anthropic decision, including some speculation as to where things might go from here, from our AI Legal Fellow, Justin Bonfiglio:

www.authorsalliance.org/2025/07/26/b...
Bartz v. Anthropic: What are some additional takeaways and where do things go from here?
AI generated image inspired by Alsup’s opinion. A judge reflects quietly as his mind overflows with the creative output of young students—books, art, and ideas born from learning, curiosity, and th…
www.authorsalliance.org
July 28, 2025 at 11:56 AM
Reposted
Word of this morning is ‘procaffeinate’: to put everything on hold until you’ve had sufficient amounts of coffee.
November 18, 2024 at 8:15 AM
Reposted
Your daily reminder: ChatGPT is not a search engine. It does not "know" anything: it smashes words together in plausible sounding way, but it does not have (and never will have) any ability to check whether something is true or false

Using it for anything beyond "re-word this headline" = failure
November 19, 2024 at 8:30 AM
Reposted
Number Shortage xkcd.com/3009
November 9, 2024 at 5:06 AM
First post here, thought I'd start with an article by @emilymbender.bsky.social.

buttondown.com/maiht3k/arch...

My tldr: Creative works are an expression of the creators' humanity - we need to understand, respect, and protect what that means in this age of LLMs.
"Virtual Employees" and Remixing Machines Devalue Human Work
Nadella's Arguments against Copyright Misrepresent both People and Computers By Emily According to Katie Prescott in The Times Microsoft CEO Satya Nadella is...
buttondown.com
November 21, 2024 at 9:57 PM