Lightnews — Scholar-powered news

Adrian Chan

@gravity7.bsky.social

Those #LLM reward models like sycophancy even more than you do!

Researchers find preferences for verbosity, listicles, vagueness, and jargon even higher among LLM-based reward models (synthetic data) than among us humans.
#AI #AIalignment
arxiv.org/abs/2506.05339

Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Models

Language models serve as proxies for human preference judgements in alignment and evaluation, yet they exhibit systematic miscalibration, prioritizing superficial patterns over substantive qualities. ...

arxiv.org

June 9, 2025 at 3:01 PM

Adrian Chan

@gravity7.bsky.social

Everybody talking about the "new" apple paper might find this MLST interview with @rao2z.bsky.social interesting. "Reasoning" and "inner thoughts" of LLMs were exposed as self-mumblings and fumblings long ago. #LLMs #AI
www.youtube.com/watch?v=y1Wn...

Do you think that ChatGPT can reason?

YouTube video by Machine Learning Street Talk

www.youtube.com

June 8, 2025 at 7:40 PM

Adrian Chan

@gravity7.bsky.social

This is interesting, published yesterday. CoT type reasoning shifts attention away from instruction tokens. Paper proposes "constraint attention" to keep models attentive to instructions when doing CoT.
#AI #LLM

www.arxiv.org/abs/2505.11423

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Reasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the-art performance on many complex reasoning ...

www.arxiv.org

May 21, 2025 at 5:54 PM

Adrian Chan

@gravity7.bsky.social

"What's the best way to think about this?" #LLM research produces encyclopedia of reasoning strategies, allowing models to select the best way to reason through problems.

arxiv.org/abs/2505.10185

The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think

Long chain-of-thought (CoT) is an essential ingredient in effective usage of modern large language models, but our understanding of the reasoning strategies underlying these capabilities remains limit...

arxiv.org

May 16, 2025 at 10:07 PM

Adrian Chan

@gravity7.bsky.social

Clarifying questions w #LLMs increase user satisfaction when users can see the point of answering them. Specific questions beat generic ones.

But I wonder if this changes when #agents are personal assistants, & are more personal & more aware.

#UX #AI #Design

arxiv.org/abs/2402.01934

Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness

Clarifying questions are an integral component of modern information retrieval systems, directly impacting user satisfaction and overall system performance. Poorly formulated questions can lead to use...

arxiv.org

May 14, 2025 at 5:22 PM

Adrian Chan

@gravity7.bsky.social

Interesting - could #LLMs in search capture context missed when googling?

"backtracing ... retrieve the cause of the query from a corpus. ... targets the information need of content creators who wish to improve their content in light of questions from information seekers."
arxiv.org/abs/2403.03956

Backtracing: Retrieving the Cause of the Query

Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they...

arxiv.org

May 14, 2025 at 2:55 PM

Adrian Chan

@gravity7.bsky.social

@tedunderwood.me In case you haven't seen this paper, you might find interesting. Researchers extract style vectors (incl from Shakespeare) and apply to an LLM internal layers instead of training on original texts. Generations can then be "steered" to a desired style.

arxiv.org/abs/2402.01618

Style Vectors for Steering Generative Large Language Model

This research explores strategies for steering the output of large language models (LLMs) towards specific styles, such as sentiment, emotion, or writing style, by adding style vectors to the activati...

arxiv.org

May 14, 2025 at 2:37 PM

Adrian Chan

@gravity7.bsky.social

"LLMs tend to (1) generate overly verbose responses, leading them to (2) propose final solutions prematurely in conversation, (3) make incorrect assumptions about underspecified details, and (4) rely too heavily on previous (incorrect) answer attempts."

arxiv.org/abs/2505.06120

LLMs Get Lost In Multi-Turn Conversation

Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, ...

arxiv.org

May 14, 2025 at 1:38 PM

Adrian Chan

@gravity7.bsky.social

"LLMs ... recognize graph-structured data... However... we found that even when the topological connection information was randomly shuffled, it had almost no effect on the LLMs’ performance... LLMs did not effectively utilize the correct connectivity information."
www.arxiv.org/abs/2505.02130

Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data

Attention mechanisms are critical to the success of large language models (LLMs), driving significant advancements in multiple fields. However, for graph-structured data, which requires emphasis on to...

www.arxiv.org

May 14, 2025 at 1:11 PM

Adrian Chan

@gravity7.bsky.social

Let's dose an LLM and study its hallucinations!

LLMs were fed "blended" prompts, impossible conceptual combinations, meant to elicit hallucinations. Models did not trip, but instead tried to reason their way through their responses.

arxiv.org/abs/2505.00557

Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

Hallucinations in large language models (LLMs) present a growing challenge across real-world applications, from healthcare to law, where factual reliability is essential. Despite advances in alignment...

arxiv.org

May 12, 2025 at 5:21 PM

Adrian Chan

@gravity7.bsky.social

An unsurprisingly sharp deep dive into sycophancy & RLHF.

Whilst this episode has its tech explanations, the social interaction aspects are unsolved & will rise as models are personalized, as mirroring is an effective design feature, & we're not good at distinguishing affirmation from agreement.

Nathan Lambert @natolambert.bsky.social · May 4

Sycophancy and the art of the model.
Borderline self-indulgant to get to write about the messiness of RLHF to this extent, but simultaneously essential. A must read. GPT-4o-simp is a sign of things to come.
www.interconnects.ai/p/sycophancy...

Sycophancy and the art of the model

GPT-4o-simp, LMArena backlash, and people refusing to understand how messy and crucial RLHF is.

www.interconnects.ai

May 4, 2025 at 5:16 PM

Adrian Chan

@gravity7.bsky.social

Analogous to honing in on the solution to a problem, this #LLM research draws conclusions from the most common "reasoning trace" & compares to final answer. Overthinking by reasoning models can be a waste of tokens - but which tokens? #AI
www.arxiv.org/abs/2504.20708

Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think

Large Language Models (LLMs) leverage step-by-step reasoning to solve complex problems. Standard evaluation practice involves generating a complete reasoning trace and assessing the correctness of the...

www.arxiv.org

May 4, 2025 at 3:55 PM

Adrian Chan

@gravity7.bsky.social

"Opportunistic sycophancy?"

Anthropic's discovery of an influence bot network shows how conversation can be weaponized. This isn't the Cambridge Analytica-styled user preference targeting, but language/conversation instead. #AIethics

www.anthropic.com/news/detecti...

Detecting and Countering Malicious Uses of Claude

www.anthropic.com

May 3, 2025 at 7:47 PM

Adrian Chan

@gravity7.bsky.social

Wld b interesting to apply this resrch to multi-turn convos w AI - AI can't anticipate convo as we do, as it can't anticipate "where we are going" - thus the temporal anticipation on AI's side of convo will lack look ahead phrases.

John Gallagher @johnrgallagher.bsky.social · May 2

OUTSTANDING articles. They used 145 essays from students and 145 ChatGPT.

"ChatGPT has distinct preferences for simpler syntactic constructions...and relies heavily on anaphoric references, whereas students demonstrate more balanced syntactic distribution..."

www.sciencedirect.com/science/arti...

Metadiscursive nouns in academic argument: ChatGPT vs student practices

The ability of ChatGPT to create grammatically accurate and coherent texts has generated considerable anxiety among those concerned that students migh…

www.sciencedirect.com

May 3, 2025 at 6:34 PM

Adrian Chan

@gravity7.bsky.social

This prompted the idea of using mechanistic probing on a model trained on period literature to conduct a digital Foucauldian examination of latent sociocultural features (as evidenced by latent layer features, activitations, circuits etc). ;-)

Period models - Black Mirror Hotel Reverie episode?

Ted Underwood @tedunderwood.com · May 2

New preprint from @lauraknelson.bsky.social, @mattwilkens.bsky.social, and myself tests different ways of simulating the past with LLMs. We don't fully answer the title question here—just show that simple strategies based on prompting and fine-tuning are insufficient. +

Can Language Models Represent the Past without Anachronism?

Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not pro...

arxiv.org

May 2, 2025 at 6:57 PM

Adrian Chan

@gravity7.bsky.social

One potential but not primary benefit to personalizing LLMs to user preferences (memory & sycophancy) might be to create more signal for model verifiers & reasoning - but is there a risk then we get individualized AI-generated filter bubbles? Cognitive biases mirrored back to us?

May 2, 2025 at 3:31 AM

Adrian Chan

@gravity7.bsky.social

Sublime reflections on AI & consciousness here.

If an LLM reads Deleuze's Logic of Sense, does Alice still become both smaller & larger at the same time? (No, because AI has neither temporality nor a "sense" for it) #philosophy #LLMs

covidianaesthetics.substack.com/p/from-apoph...

From Apophasis to Apophenia. A Response to Murray Shanahan

Inmachination #03

covidianaesthetics.substack.com

May 1, 2025 at 5:05 PM

Reposted by Adrian Chan

Nicole Hennig

@nic221.bsky.social

When ChatGPT Broke an Entire Field: An Oral History | Quanta Magazine https://www.quantamagazine.org/when-chatgpt-broke-an-entire-field-an-oral-history-20250430/ #AI #history (very interesting)

Text Shot: R. THOMAS MCCOY (assistant professor, department of linguistics, Yale University): During that summer, I vividly remember members of the research team I was on asking, “Should we look into these transformers?” and everyone concluding, “No, they’re clearly just a flash in the pan.”

May 1, 2025 at 2:25 PM

Adrian Chan

@gravity7.bsky.social

Sleep-time compute - offline reasoning used to speculate on the user's reasoning, and generate and anticipate future queries. And if used by a group, even offer a kind of "collective intelligence" by speculating on the group or team's aggregate queries.
#LLM #AI
www.alphaxiv.org/abs/2504.13171

Sleep-time Compute: Beyond Inference Scaling at Test-time | alphaXiv

View 1 comments: Really like this idea. I've long thought it strange that LLMs reason on their own reasons but not also on the reasoning of the user. Which is what we do when we communicate — take the...

www.alphaxiv.org

April 25, 2025 at 8:03 PM

Adrian Chan

@gravity7.bsky.social

This missive from Dario is worth the read (and mech interp on #AI is truly fascinating). Among features/concepts found in #LLMs: "genres of music that express discontent."

I'm reminded of Borges Chinese Encyclopedia of Animals

www.darioamodei.com/post/the-urg...

April 25, 2025 at 6:50 PM

Adrian Chan

@gravity7.bsky.social

Loved this piece, and reminded of Anthony Giddens' view of tech as "disembedding mechanism" - how soon and how varied will be AI's disembedding of intellect, work, roles, jobs, tasks from contemporary society?

Henry Farrell @himself.bsky.social · Apr 17

Cosma Shalizi on LLMs as libraries of 'intellect' (original version - bactra.org/weblog/feral... and also republished in my newsletter www.programmablemutter.com/p/on-feral-l... )

On Feral Library Card Catalogs, or, Aware of All Internet Traditions

A Guest Post by Cosma Shalizi

www.programmablemutter.com

April 23, 2025 at 8:27 PM

Adrian Chan

@gravity7.bsky.social

Does this suggest a future of tiny reasoning models each with a different "cognitive" architecture (or reasoning methods and applications)?

Sung Kim @sungkim.bsky.social · Apr 23

Tina: Tiny Reasoning Models via LoRA

Tina model achieves a >20% reasoning performance increase and 43.33% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). The surprising effectiveness of efficient RL reasoning via LoRA.

April 23, 2025 at 6:56 PM

Adrian Chan

@gravity7.bsky.social

"The Model is the Message. Models don’t talk—they reshape the conditions of thought. Where once the medium shaped the message, now the model shapes the mind."

That's what ChatGPT gave me after a lengthy conversation about semiotics, McLuhan, and #LLMs

April 21, 2025 at 5:41 PM

Adrian Chan

@gravity7.bsky.social

Don't use summarizers for the papers by @rao2z.bsky.social because the reasoning traces therein are, unlike the LRMs & LLMs under investigation, substantively meaningful, semantically well-ordered, and stylistically compelling and engaging!
#AI #LLMs #CoT
arxiv.org/abs/2504.09762

(How) Do reasoning models reason?

We will provide a broad unifying perspective on the recent breed of Large Reasoning Models (LRMs) such as OpenAI o1 and DeepSeek R1, including their promise, sources of power, misconceptions and limit...

arxiv.org

April 19, 2025 at 7:23 PM

Reposted by Adrian Chan

Dare Obasanjo

@carnage4life.bsky.social

Another chapter in AI biting the hand that feeds it: Wikipedia’s bandwidth surged 50% since January thanks to AI crawlers.

Unlike search engines, they send no traffic back so no new users, no new donors. Just rising costs and a shrinking audience.

A raw deal for a cornerstone of the free web.

How crawlers impact the operations of the Wikimedia projects

Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grow…

diff.wikimedia.org

April 2, 2025 at 12:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news