Lightnews — Scholar-powered news

Valerio Pepe

@valeriopepe.bsky.social

Computer Science + Cognitive Science @harvard.edu, class of '26. Interested in language ∩ thought, language acquisition.

Visiting Student @MITCoCoSci @csail.mit.edu

Posts Replies Media Videos

Valerio Pepe

@valeriopepe.bsky.social

Out of curiosity (and my own ignorance), how are teachers aware of students' socioeconomic backgrounds when the students are this young?

I can think of clothing as an immediate signal, and, over time, getting to know parents (and thus their occupations). Are these the main ways this is inferred?

September 5, 2025 at 5:18 PM

Reposted by Valerio Pepe

Tomer Ullman

@tomerullman.bsky.social

"for too long has my foot been allowed to carry my body" I say, as I load a shotgun and aim at it.

August 28, 2025 at 4:26 PM

Valerio Pepe

@valeriopepe.bsky.social

> looking for a coffee
> have to judge if their coffee is burnt or flavorful
> "we have a Cimbali coffee machine"

> buy coffee

> it's burnt

July 18, 2025 at 6:55 PM

Valerio Pepe

@valeriopepe.bsky.social

We take this as evidence that while misalignment directions may exist, the narrative is probably quite nuanced, and EM is not governed by a single vector, as some hypothesized in the aftermath of the original paper.

See it for yourself at:
www.lesswrong.com/posts/qHudHZ...

Emergent Misalignment on a Budget — LessWrong

TL;DR We reproduce emergent misalignment (Betley et al. 2025) in Qwen2.5-Coder-32B-Instruct using single-layer LoRA finetuning, showing that tweaking…

www.lesswrong.com

June 8, 2025 at 8:39 PM

Valerio Pepe

@valeriopepe.bsky.social

However, the steered models often are more incoherent than the finetuned ones, suggesting that emergent misalignment is not entirely guided by a steering vector. The vectors themselves are also not very interpretable, so it is unclear what exactly they are capturing.

June 8, 2025 at 8:39 PM

Valerio Pepe

@valeriopepe.bsky.social

The answer is: yes (sort of).

Though the finetune itself seems to be learning more than a single steering vector, extracting steering vectors and applying them (with sufficient scaling) to the same layer in an un-finetuned version of the model *does* elicit misaligned behavior.

June 8, 2025 at 8:39 PM

Valerio Pepe

@valeriopepe.bsky.social

We finetuned a single layer, and show that on certain layers, this process renders the model nearly as misaligned as a full-layer finetune. This allows us to ask: can we capture this misalignment in a single steering vector taken from the layer?

June 8, 2025 at 8:39 PM

Valerio Pepe

@valeriopepe.bsky.social

An interpretation of the original paper was that EM is mediated by a “misalignment direction” within the model, which the finetuning process changes, rendering the model much more toxic/misaligned.

June 8, 2025 at 8:39 PM

Valerio Pepe

@valeriopepe.bsky.social

ai is truly revolutionary -- scientists hadn't previously considered what would happen if sally had simply eaten the marble instead, to know its location at all times

May 15, 2025 at 11:29 PM

Valerio Pepe

@valeriopepe.bsky.social

fair, italy has some incredibly creative offensive slang -- fwiw my favorite usable roman insult ("porco dio" is too offensive for casual use) is "sei 'na pentola de facioli", "you are a pot of beans", i.e. you never stop muttering and talking

April 7, 2025 at 1:55 PM

Valerio Pepe

@valeriopepe.bsky.social

as someone from Rome I'm currently sitting at my laptop like the mentats from Dune trying to figure out what words this could be referring to

we also take pride in preparing gnocchi incorrectly because the rest of italy can't make a decent carbonara to save their lives (no cream and no parmesan!)

April 7, 2025 at 1:30 PM

Valerio Pepe

@valeriopepe.bsky.social

congratulations!!

March 27, 2025 at 2:14 PM

Valerio Pepe

@valeriopepe.bsky.social

the icml keynote will be jensen huang speaking to an empty room

March 25, 2025 at 10:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news