Lightnews — Scholar-powered news

neurosock

@neurosock.bsky.social

DA is an adaptive signal that switches its function in real-time.

Upon reward delivery, it stops predicting force and starts predicting the *rate of licking* (Fig 9h, 9j).

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

Optogenetically stimulating DA neurons in place of reward was *not sufficient* for learning.

Inhibiting DA neurons during the task *did not impair* learning (Fig 10b, 10i).

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

Changes in DA firing due to reward magnitude, probability, and omission were all explained by parallel changes in the *force* the mice exerted (Figs 4, 5).

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

This force-tuning was independent of reward, appearing during spontaneous movements.

It even held true during an aversive air puff, proving it's not about "reward" (Fig 3e, 3h).

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

This is how we learned that from their results:

They identified two distinct DA neuron types: "Forward" and "Backward" populations.

These cells fire to drive movement in a specific direction (Fig 1e, 1h, 1k).

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

How does reward probability change the DA signal?

RPE view: DA activity scales with reward probability, encoding the *strength* of the prediction.

New view: Probability changes the animal's *effort*, and the DA signal simply tracks that performance.

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

Why does a bigger reward cause a bigger DA spike?

RPE view: A larger reward causes a larger DA spike because it's a bigger "positive error."

New view: A larger reward makes the mouse push *harder*, and the DA spike just tracks that *vigor*.

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

Why does DA firing "dip" when a reward is omitted?

RPE view: The famous "dip" in DA when a reward is omitted is a negative prediction error.

New view: The dip simply reflects the animal *abruptly stopping* its forward movement.

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

In RPE, the DA signal shifts from reward to cue 🔔 over time.

Why does the DA signal "move" to the cue?

RPE view: This helps an animal learn what things are important.

New view: the signal shifts because the animal's *action* (pushing forward) shifts to the cue.

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

Let's unpack this:

The classical RPE model says DA neurons encode the *difference* between expected and actual rewards.

A surprise spikes DA activity, while disappointment causes a dip.

This helps a mouse 🐭 learn what to pay attention to.

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

Dopamine ≠ reward but turns out, also not the learning molecule we thought.

If DA RPE is the emperor, this work SCREAMS it was running naked all the time.

This paper got quite some attention recently. Let's simplify it a bit.

A🧵with my toy model and notes:

#neuroskyence #compneuro #NeuroAI

October 30, 2025 at 11:36 AM

neurosock

@neurosock.bsky.social

New habits move from the Prefrontal Cortex (conscious effort) to the Basal Ganglia (habit loop). A trick is to automate the first 30 seconds of a habit, minimizing the PFC's required 'energy' for initiation until the Basal Ganglia takes over. #LifeHacks #SelfImprovement #Brain

October 25, 2025 at 3:32 PM

neurosock

@neurosock.bsky.social

This work unifies planning with sequence working memory.

The same neural architecture used to *remember* a sequence can be used to *infer* a plan (Fig 2B).

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

For new mazes, the RNNs adapted by learning *transitions* instead of fixed locations (Fig 6C).

Sensory input about a wall specifically inhibited the neural representation for that impossible transition (Fig 6H).

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

The trained RNNs explicitly learned the maze rules—the 'world model'—in their synaptic weights (Fig 5B).

The connections between subspaces representing future steps perfectly matched the maze's adjacency matrix (Fig 5E).

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

Recurrent Neural Networks (RNNs) trained to solve these problems learned the STA solution on their own (Fig 4C).

They developed the same "conveyor belt" dynamics, suggesting it's an efficient solution (Fig 4E).

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

This is how we learned that from their results:

The Spacetime Attractor (STA) model successfully solves complex, dynamic tasks that other models fail (Fig 3D-E).

It excels when rewards change *within* a trial, like intercepting a moving target.

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

This completes a beautiful picture of how learning/adaptation could happen in the 🧠:

Striatum: temporal-difference learning. Slow, habits🐢

Hippocampus: successor-representation learning. OK fast🐇, memories.

PFC: space-time attractor. Very fast🚀, reconfigures brain.

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

But how does this adapt to new mazes?

Training nets to challenging environments, they found nets learn scaffolds of all possible actions instead of locations.

When it sees a new wall, sensory input just "blocks" or inhibits that specific action, making the plan flexible.

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

As the agent takes the first action, the entire plan representation shifts forward in time.

What was in the "NEXT" 🟦 subspace moves to the "NOW" 🟩 subspace, readying the next action without re-planning.

Conveyor belt dynamics, they call it.

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

Planning happens when inputs, like the 'start' and 'goal', are applied to this network.

The circuit activity then rapidly "relaxes" into the most stable pattern, which *is* the optimal path.

In each subspace, only the units representing the locations in the plan are active.

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

These subspaces are connected based on the environment's rules, like a 'world model' etched into the wiring.

Synapses from "NOW" 🟩 to "NEXT" 🟦 only exists if the agent can *actually* move between those two locations.

Possible move: excitatory
Not possible move: inhibitory

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

Let's unpack this:

The model assumes the PFC is split into different "subspaces," each representing a different point in the future.

One group of neurons maps "NOW (0)," 🟩 another maps "NEXT (1)," 🟦 and another maps "LATER (2),"🟪 all active simultaneously.

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

To plan the future, the PFC represents a step-by-step map of actions, and at every step, this plan moves to the past like a conveyor belt.

This proposes a simple subspace architecture for planning.

A toy environment can clarify it.

A🧵with my toy model and notes:

#neuroskyence #compneuro #NeuroAI

October 22, 2025 at 9:16 PM

neurosock

@neurosock.bsky.social

To stop unwanted thoughts, the PFC sends a control signal to the hippocampus (brain's memory retrieval center). This activates local inhibitory interneurons (GABA), which then suppresses retrieval. Difficulties in this pathway may underlie #PTSD, #OCD, anxiety, and #depression.

October 18, 2025 at 9:49 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news