Aran Nayebi
banner
anayebi.bsky.social
Aran Nayebi
@anayebi.bsky.social
Assistant Professor of Machine Learning, Carnegie Mellon University (CMU)

Building a Natural Science of Intelligence 🧠🤖

Prev: ICoN Postdoctoral Fellow @MIT, PhD @Stanford NeuroAILab
Personal Website: https://cs.cmu.edu/~anayebi
In today's Generative AI lecture, we cover code generation & autonomous agents, discussing how Github Co-Pilot works, diving into multimodal agents (like Gemini 3 Pro!), and ending on AI scientists & AI for science. Lots more to explore in this rapidly growing space!
November 19, 2025 at 9:21 PM
In today's Generative AI lecture, we dive into reasoning models by dissecting how DeepSeek-R1 works (GRPO vs. PPO, which removes the need for a separate value network + training with a simpler rule-based reward), and end on mechanistic interpretability to better understand those reasoning traces.
November 10, 2025 at 8:46 PM
In today's Generative AI lecture, we primarily discuss scaling laws and the key factors that go into building large-scale foundation models.

Slides: www.cs.cmu.edu/~mgormley/co...

Full course info: bsky.app/profile/anay...
October 23, 2025 at 1:44 PM
In today's Generative AI lecture, we talk about all the different ways to take a giant auto-complete engine like an LLM and turn it into a useful chat assistant.
October 1, 2025 at 7:46 PM
In today's Generative AI lecture, we discuss the 4 primary approaches to Parameter-Efficient Fine-Tuning (PEFT): subset, adapters, Prefix/Prompt Tuning, and Low-Rank Adaptation (LoRA).

We show each of these amounts to finetuning a different aspect of the Transformer.
September 29, 2025 at 8:00 PM
1/6 Recent discussions (e.g. Rich Sutton on @dwarkesh.bsky.social’s podcast) have highlighted why animals are a better target for intelligence — and why scaling alone isn’t enough.
In my recent @cmurobotics.bsky.social seminar talk, “Using Embodied Agents to Reverse-Engineer Natural Intelligence”,
September 29, 2025 at 2:02 PM
In today's Generative AI lecture, we discuss how to implement Diffusion Models and go through their derivation. Next time, we discuss their deeper relationships with variational inference :)

Slides: www.cs.cmu.edu/~mgormley/co...

Full course info: bsky.app/profile/anay...
September 17, 2025 at 7:51 PM
In today's Generative AI lecture, we discuss Generative Adversarial Networks (GANs) & review probabilistic graphical models (PGMs) as a prelude to Diffusion models and VAEs, which we will discuss next time!

Slides: www.cs.cmu.edu/~mgormley/co...

Full course info: bsky.app/profile/anay...
September 15, 2025 at 9:19 PM
In today's Generative AI lecture, we cover Vision Transformers (as well as the broader notion of Encoder-Only Transformers).

We also explain the historical throughline to some of these ideas, inspired by Nobel-prize-winning observations in neuroscience!
September 11, 2025 at 1:36 AM
In today's Generative AI lecture, we give an overview of the pre-training/post-training pipeline, and discuss modern Transformer implementations, from Rotary Position Embeddings (RoPE) to Grouped Query Attention (GQA) to Sliding Window Attention.
September 8, 2025 at 8:37 PM
In today's Generative AI lecture, we cover how to train a Transformer Language Model, as well as what makes it efficient at learning in order to scale to GPT levels—covering key-value caching & tokenizers, among other things:
September 4, 2025 at 1:16 AM
11/ Our framework builds on prior great work: Debate (@girving.bsky.social), CIRL (@dhadfieldmenell.bsky.social), agreement protocols from @surbhigoel.bsky.social @aaroth.bsky.social & others.

This lets us study the *intrinsic* complexity of alignment, separate from specific modeling choices.
July 31, 2025 at 3:12 PM
7/ 🤖 Bounded agents need the right inductive biases
(E.g. bounded rationality, memory, and theory of mind)

Even mild noise or imperfect memory can *exponentially* increase alignment costs—unless protocols exploit structure.
July 31, 2025 at 3:12 PM
5/ 👁 Task-space growth ⇒ oversight failure

Alignment can become intractable when the task state space (D) gets too large. By our lower bounds, costs always grow with MN^2, where M is the number of tasks & N is the number of agents, though for many protocols, costs additionally grow as MN^2 D.
July 31, 2025 at 3:12 PM
3/ 🧠 Too many values ⇒ makes alignment intractable
Even unbounded agents can’t align efficiently if they must encode an exponentially large or high-entropy set of human values.

Safety implication: Focus on objective compression, delegation, or progressive disclosure, not one-shot full specification
July 31, 2025 at 3:12 PM
9/ We show that for finite verification horizons, safety can be:

– Checked in randomized polytime
– Audited with zero-knowledge proofs
– Verified using differential privacy

This makes formal auditing compatible with user privacy and proprietary weights.
July 29, 2025 at 7:49 PM
8/ But what if the model gets hacked post-deployment?

We prove:
❌ In general, verifying corrigibility after arbitrary modification is undecidable—as hard as the halting problem.

So we carve out a decidable island.
July 29, 2025 at 7:49 PM
7/ This includes multi-step, self-modifying agents:

– Off-switch behavior extends across time
– Spawned agents inherit corrigibility
– Gradual loss of control is modeled explicitly

We prove multi-step corrigibility and net benefit still hold under learned approximations.
July 29, 2025 at 7:49 PM
6/ Our framework works even when each head is only approximately learned (e.g. via regression) and planning is suboptimal.

As long as approximation errors are bounded, so are safety violations—and human net benefit (per Carey & @tom4everitt.bsky.social) is still guaranteed.
July 29, 2025 at 7:49 PM
5/ We prove our framework satisfies all five corrigibility criteria—even in partially observable settings like the recent PO-OSG (Garber et al. 2025), which extends the classic off-switch game of @dhadfieldmenell.bsky.social et al. 2016.
July 29, 2025 at 7:49 PM
4/ The key idea? Don’t blend everything into a single reward stream (we prove *no-go*).
Instead, we define *five separate utility heads*:

-Obedience
-Switch-access preservation
-Truthfulness
-Low-impact / reversibility
-Task reward

Combined *lexicographically*. Safety dominates.
July 29, 2025 at 7:49 PM
3/ We tackle this by returning to a core idea: *corrigibility*—proposed by Soares et al. 2015. Corrigibility isn’t about solving ethics. It’s about keeping AI under control.

We formalize and *guarantee* this behavior under learning and planning error.
July 29, 2025 at 7:49 PM
2/ Today’s LLM alignment methods, like RLHF & Constitutional AI, blend safety and performance into one reward. When they conflict, safety can lose.

But encoding all of human ethics isn’t feasible—and our prior work shows that large value sets hit alignment complexity barriers: tinyurl.com/wr6jrt2b
July 29, 2025 at 7:49 PM
1/ How do we build AI systems that are corrigible—shut down when asked, tell the truth, preserve oversight—and still do something useful?

We give the first provable framework that makes it implementable—unlike RLHF or Constitutional AI, which can fail when goals conflict.

🧵👇
July 29, 2025 at 7:49 PM
4️⃣ Applications:
Besides stand-alone usage, TNNs integrate naturally with standard feedforward networks, Transformers, and SSMs, combined via our Encoder-Attender-Decoder (EAD) architecture for tactile processing and beyond.

See our recent paper for more details on EAD: bsky.app/profile/trin...
July 24, 2025 at 4:21 PM