Lightnews — Scholar-powered news

Gabriele Sarti

@gsarti.com

Wrapping up my oral presentations today with our TACL paper "QE4PE: Quality Estimation for Human Post-editing" at the Interpretability morning session #EMNLP2025 (Room A104, 11:45 China time)!

Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...

November 7, 2025 at 2:50 AM

Gabriele Sarti

@gsarti.com

The session ended with Claude committing harakiri by deleting all DOM elements (including the chatbox for interacting with it) except the two beautiful sticky notes I asked it to make. I consider this first playing session a success!

October 2, 2025 at 4:22 PM

Gabriele Sarti

@gsarti.com

Unforeseen development

October 2, 2025 at 4:16 PM

Gabriele Sarti

@gsarti.com

What could go wrong when asking Claude to make an Imagine demo within Claude Imagine and using it to play Tic Tac Toe? When notified about the error, the model promptly adds "Sorry about that. Continue playing..." to the interface 😂

October 2, 2025 at 4:15 PM

Gabriele Sarti

@gsarti.com

I picked this expecting something close to the familiar sci-fi shorts style of Ted Chiang, but I ended up enjoying Ken Liu even more! His combination of fantastic elements with Chinese and East Asian culture and history is quite unique. Top picks: State Change, The Literomancer, The Paper Menagerie.

September 23, 2025 at 12:38 PM

Gabriele Sarti

@gsarti.com

Now with sleek flyers to test your skills in Italian crossword solving! 🤗 Join our #EVALITA2026 task!

September 23, 2025 at 7:17 AM

Gabriele Sarti

@gsarti.com

It is again the time of year when I beg @aclmeeting.bsky.social execs to rethink the current streaming platform system. For my #EMNLP2025 submissions, I am *required* to upload 2 video recordings + 2 posters + 2 slide decks. Why force both posters and talks for all? Nonsense.

September 15, 2025 at 3:20 PM

Gabriele Sarti

@gsarti.com

TFW milk producers use semantic versioning better than LLM providers

August 26, 2025 at 10:56 AM

Gabriele Sarti

@gsarti.com

@zouharvi.bsky.social recommended this and I finally gave it a shot. Excellent read for all academics, and esp. early career people, tracing back many issues in the research landscape to a misplaced system of incentives. Will be my go-to textbook if I ever teach a research practices 101 class!

August 20, 2025 at 5:47 AM

Gabriele Sarti

@gsarti.com

After a long streak on nonfiction, I landed on "Harry Potter meets the Roman Empire". Loved the flavorful worldbuilding, the mechanics of will usage (the fantasy component, very coherent with the overall plot) and the charismatic characters. Looking forward to Part 2!

August 2, 2025 at 5:30 PM

Gabriele Sarti

@gsarti.com

Interesting stats for ACL first authors' country of affiliation! (2024 vs 2025)

July 28, 2025 at 11:33 AM

Gabriele Sarti

@gsarti.com

As a European I was very curious to read this book, which I saw heralded as the de-facto manifesto of US pro-growth libertarians. A lot of no-nonsense points, esp. on risk-taking in science, but I'm left a bit uneasy by the utopistic techno-solutionism. The elephant in the room: abundance for whom?

July 21, 2025 at 7:34 AM

Gabriele Sarti

@gsarti.com

Empire of AI was a great overview of the modern history of AI and the challenges brought by the cuttroath competition of industrial superpowers. Unlike other works toeing the "AI is all bullshit" line, Hao takes a critical but constructive stance that makes for a refreshing read. Highly recommended!

July 1, 2025 at 7:47 AM

Gabriele Sarti

@gsarti.com

Keeping up with the animal intelligence theme, Other Minds was an interesting read, although nothing special from a narrative standpoint

July 1, 2025 at 6:54 AM

Gabriele Sarti

@gsarti.com

Finally, we correlate metrics with the num. of annotators marking each token as error as more robust gold labels. We find that >3 annotations are enough for robust metric rankings, with still some margin to human-level performance → use multiple annotation sets for WQE eval! 7/

May 30, 2025 at 2:28 PM

Gabriele Sarti

@gsarti.com

XCOMETs underperform because they do not match translators' subjective error annotation propensity. Using the granular p(error) value from XCOMET significantly boost their performance when calibration is possible → desirable for a fair evaluation 6/

May 30, 2025 at 2:28 PM

Gabriele Sarti

@gsarti.com

We test predictive uncertainty (entropy, logprobs MCD avg/var), vocab projections (logitlens variants) and context mixing (attention entropy), plus XCOMET (@nunonmg.bsky.social) as supervised baselines → while costly, MCD is competitive with the 11B trained model! 5/

May 30, 2025 at 2:28 PM

Gabriele Sarti

@gsarti.com

While most evals employ a single annotation set as reference, we use our recent QE4PE dataset (arxiv.org/abs/2503.03044) to obtain up to 6 word-level error annotations per segment from different post-edits. This allows us to set a "human-level agreement baseline" for the task. 4/

May 30, 2025 at 2:28 PM

Gabriele Sarti

@gsarti.com

📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐

We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings!

🧵 1/

May 30, 2025 at 2:28 PM

Gabriele Sarti

@gsarti.com

The concept-driven painting demo by Goodfire is very cool!
paint.goodfire.ai

Here's a manually drawn dragon with a lion head sitting atop of a pyramid in the middle of the sea with a planet in the top-right corner :)

May 28, 2025 at 8:28 AM

Gabriele Sarti

@gsarti.com

[🇮🇹 posting] Playing with Claude 4 on verbalized rebuses (aclanthology.org/2024.clicit-...), it's quite fascinating how in this case it thought that the definition "Ora è compact" (Now it's compact) could be a wordplay, and it considered compact versions of "Ora" (Now) as potential solutions 😄

May 27, 2025 at 5:38 PM

Gabriele Sarti

@gsarti.com

Excited to finally start using this gorgeous LLaMA-powered notebook I got in an indie shop in Singapore during EMNLP'23!

May 13, 2025 at 7:26 AM

Gabriele Sarti

@gsarti.com

Hats off to the GDM team for reporting negative results on SAEs while being very invested on those lines! But I can't help feeling like...

www.alignmentforum.org/posts/4uXCAJ...

March 27, 2025 at 7:00 AM

Gabriele Sarti

@gsarti.com

Fictions was my first Borges—I know, better late than never—and it was definitely a trip. Some stories, like Pierre Menard, were quite forgettable. But most of them, like the Library of Babel, Funes and Death and The Compass, were works of art. Planning to read the Aleph in the near future.

March 5, 2025 at 8:28 PM

Gabriele Sarti

@gsarti.com

Finally in Toulouse 🇫🇷 where I'll collaborate with @fannyjrd.bsky.social @antoninpoche.bsky.social and the DEEL/FOR teams at IRT & ANITI on an exciting interpretability project. Stay tuned! 🔍

February 25, 2025 at 5:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news