Lightnews — Scholar-powered news

Reposted by Bruce (Zhi) Wen

Nathan Lambert

@natolambert.bsky.social

We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking & instruct models.
3. The first 32B (or larger) fully open reasoning model.

November 20, 2025 at 2:32 PM

Reposted by Bruce (Zhi) Wen

Ai2

@ai2.bsky.social

Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚

November 18, 2025 at 3:31 PM

Reposted by Bruce (Zhi) Wen

Conference on Language Modeling

@colmweb.org

COLM is going to San Francisco for 2026!

🗓️Dates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square

Website and CFPs for papers and workshops coming up soon!

November 11, 2025 at 7:30 PM

Reposted by Bruce (Zhi) Wen

Maria Antoniak

@mariaa.bsky.social

If you're a student in need of a personal website (and if you're doing research, yes, you need a website!), I keep a list of nice examples here, most of which are reusable: www.are.na/maria-antoni...

For example, I just spotted this beautiful website by Catherine Yeh: github.com/catherinesye...

A screenshot of Catherine Yeh's website: https://catherinesyeh.github.io/

The website looks clean, colorful, and modern.

A screenshot of four of the websites included in my Are.na board. They mostly show academic websites in different styles. Some are more text-heavy, some feature more colors and images.

November 3, 2025 at 8:11 PM

Reposted by Bruce (Zhi) Wen

Reto Fiolka

@retof.bsky.social

I did not expect that: Large Language Models are invertible:

arxiv.org/abs/2510.15511

Language Models are Injective and Hence Invertible

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the in...

arxiv.org

November 1, 2025 at 3:27 AM

Reposted by Bruce (Zhi) Wen

Nathan Lambert

@natolambert.bsky.social

A new essay on the crazy, all or nothing approach to work happening in AI today, the looming human costs, and the lack of a finish line.

I wouldn't say it's okay, but I'm not sure how to fix it.
www.interconnects.ai/p/burning-out

Burning out

The international AI industry's collective risk.

www.interconnects.ai

October 25, 2025 at 2:35 PM

Reposted by Bruce (Zhi) Wen

LawZero - LoiZéro

@law-zero.bsky.social

LawZero is growing fast, and we're always looking for dedicated people to join our team.
If you're interested in working on technical safeguards to create safe-by-design AI systems, check out the openings on our website and don't hesitate to reach out to our team!
job-boards.greenhouse.io/lawzero

LawZero

About LawZero LawZero is a non-profit organization committed to advancing research and creating technical solutions that enable safe-by-design AI systems. Its ...

job-boards.greenhouse.io

October 24, 2025 at 2:58 PM

Reposted by Bruce (Zhi) Wen

Kevin K. Yang 楊凱筌

@kevinkaichuang.bsky.social

Are you a PhD student interested in ML and biology or health? Come do an internship with me, @avapamini.bsky.social, Alex Lu, @lcrawford.bsky.social, or Kristen Severson at MSRNE!

Applications are due Dec 1: make sure you include a research statement!

jobs.careers.microsoft.com/global/en/jo...

Search Jobs | Microsoft Careers

jobs.careers.microsoft.com

October 21, 2025 at 7:32 PM

Reposted by Bruce (Zhi) Wen

depths of wikipedia

@depthsofwikipedia.bsky.social

entire article

October 20, 2025 at 11:00 PM

Reposted by Bruce (Zhi) Wen

Matthew Finlayson

@mattf.nl

We discovered that language models leave a natural "signature" on their API outputs that's extremely hard to fake. Here's how it works 🔍

📄 arxiv.org/abs/2510.14086 1/

Every Language Model Has a Forgery-Resistant Signature

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying...

arxiv.org

October 17, 2025 at 5:59 PM

Reposted by Bruce (Zhi) Wen

Naomi Saphra

@nsaphra.bsky.social

I am recruiting PhD students to start in 2026! If you are interested in robustness, training dynamics, interpretability for scientific understanding, or the science of LLM analysis you should apply. BU is building a huge LLM analysis/interp group and you’ll be joining at the ground floor.

Naomi Saphra @nsaphra.bsky.social · Mar 27

Life update: I'm starting as faculty at Boston University
@bucds.bsky.social in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/ @najoung.bsky.social @amuuueller.bsky.social. Looking for my first students, so apply and reach out!

CDS building which looks like a jenga tower

October 16, 2025 at 3:45 PM

Bruce (Zhi) Wen

@zhi-bruce-wen.bsky.social

Clever sampling from base model > GRPO post-training.

One of the coolest papers I've read recently (in addition to QAlign/QUEST which has similar approaches).

arxiv.org/abs/2510.14901

Reasoning with Sampling: Your Base Model is Smarter Than You Think

Frontier reasoning models have exhibited incredible capabilities across a wide array of disciplines, driven by posttraining large language models (LLMs) with reinforcement learning (RL). However, desp...

arxiv.org

October 17, 2025 at 3:59 PM

Reposted by Bruce (Zhi) Wen

Naomi Saphra

@nsaphra.bsky.social

This is so cool. When you look at representational geometry, it seems intuitive that models are combining convex regions of "concepts", but I wouldn't have expected that this is PROVABLY true for attention or that there was such a rich theory for this kind of geometry.

Thomas Fel @thomasfel.bsky.social · Oct 15

🕳️🐇Into the Rabbit Hull – Part II

Continuing our interpretation of DINOv2, the second part of our study concerns the *geometry of concepts* and the synthesis of our findings toward a new representational *phenomenology*:

the Minkowski Representation Hypothesis

October 16, 2025 at 6:33 PM

Reposted by Bruce (Zhi) Wen

Maria Antoniak

@mariaa.bsky.social

Keynote at #COLM2025: Nicholas Carlini from Anthropic

"Are language models worth it?"

Explains that the prior decade of his work on adversarial images, while it taught us a lot, isn't very applied; it's unlikely anyone is actually altering images of cats in scary ways.

October 9, 2025 at 1:12 PM

Bruce (Zhi) Wen

@zhi-bruce-wen.bsky.social

When I said data poisoning instead of food poisoning in a completely non-ML context I knew I probably need a break.

October 6, 2025 at 11:19 PM

Reposted by Bruce (Zhi) Wen

Maria Antoniak

@mariaa.bsky.social

Here’s a #COLM2025 feed!

Pin it 📌 to follow along with the conference this week!

October 6, 2025 at 8:26 PM

Bruce (Zhi) Wen

@zhi-bruce-wen.bsky.social

On my way (back) to Montreal for #COLM2025 🥯.

Looking forward to see what people are thinking about controllable/safe generation, eval, diffusion LM, interpretability, etc.

And we’re still hirng applied research scientists at Mila! ⬇️

October 6, 2025 at 12:41 PM

Reposted by Bruce (Zhi) Wen

Sheridan Feucht

@sfeucht.bsky.social

[📄] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

April 7, 2025 at 1:54 PM

Bruce (Zhi) Wen

@zhi-bruce-wen.bsky.social

This is a great distinction to make and the characterization is very accurate.

Naomi Saphra @nsaphra.bsky.social · Sep 19

I wish students understood in most empirical AI research there’s a huge scientific advantage from being constitutionally excited by math vs intimidated, but very little additional gain from being actually “good” at math. Maybe they’d be less intimidated if they didn’t feel they had to be “good”.

September 19, 2025 at 7:39 PM

Reposted by Bruce (Zhi) Wen

Kevin K. Yang 楊凱筌

@kevinkaichuang.bsky.social

If you're an undergrad and want to intern with me, this is where you need to apply!

Microsoft Research @msftresearch.bsky.social · Sep 12

The Microsoft Research Undergraduate Internship Program offers 12-week internships in our Redmond, NYC, or New England labs for rising juniors and seniors who are passionate about technology. Apply by October 6: msft.it/6015scgSJ

September 13, 2025 at 10:42 AM

Reposted by Bruce (Zhi) Wen

kyunghyuncho.bsky.social

@kyunghyuncho.bsky.social

amen

those with savior and superiority complex, obsessed with sci fi, blinded by dollar signs, devoid of empathy, and severely gullible.

Ben Recht @beenwrekt.bsky.social · Aug 28

I wrote about the tragic death of Adam Raine and the venal negligence of "AI Safety." www.argmin.net/p/the-banal-...

The Banal Evil of AI Safety

Chatbot companies are harmful and dishonest. How can we hold them accountable?

www.argmin.net

August 28, 2025 at 3:50 PM

Reposted by Bruce (Zhi) Wen

Mila - Institut québécois d'IA

@mila-quebec.bsky.social

We were delighted to welcome the Honorable @mark-carney.bsky.social and Evan Solomon to Mila today for a rich discussion on AI's potential to drive innovation, social progress, and economic resilience in the country, alongside key players from our ecosystem.

August 20, 2025 at 10:08 PM

Reposted by Bruce (Zhi) Wen

Paper Skygest Team

@paper-feed.bsky.social

**Please repost** If you're enjoying Paper Skygest -- our personalized feed of academic content on Bluesky -- we'd appreciate you reposting this! We’ve found that the most effective way for us to reach new users and communities is through users sharing it with their network

August 19, 2025 at 5:15 PM

Bruce (Zhi) Wen

@zhi-bruce-wen.bsky.social

Why I took a 4.5 hr $90 flixbus to Ottawa instead of a 4.5 hr $500 train when Air Canada messed up ⬇️

Bruce (Zhi) Wen @zhi-bruce-wen.bsky.social · Aug 5

I feel the same way traveling between Toronto and Montreal too. Worst part is the existing train is slower AND more expensive than flying (and has the same luggage restriction).

August 18, 2025 at 2:01 AM

Reposted by Bruce (Zhi) Wen

Ai2

@ai2.bsky.social

With fresh support of $75M from NSF and $77M from NVIDIA, we’re set to scale our open model ecosystem, bolster the infrastructure behind it, and fast‑track reproducible AI research to unlock the next wave of scientific discovery. 💡

August 14, 2025 at 12:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news