Lightnews — Scholar-powered news

Greta Tuckute

@gretatuckute.bsky.social

Examples: audio before red line = ground-truth prompt; after = AuriStream’s prediction, visualized in the time-frequency cochleagram space.

AuriStream shows that causal prediction over short audio chunks (cochlear tokens) is enough to generate meaningful sentence continuations!

August 19, 2025 at 1:12 AM

Greta Tuckute

@gretatuckute.bsky.social

Complementing AuriStream’s strong representational capabilities, AuriStream learns short- and long-range speech statistics—completing phonemes and common words at short scales, and generating diverse continuations at longer scales, as evident by the qualitative examples below.

August 19, 2025 at 1:12 AM

Greta Tuckute

@gretatuckute.bsky.social

We present a two-stage framework, loosely inspired by the human auditory hierarchy:

1️⃣ WavCoch: a small model that transforms raw audio into a cochlea-like time-frequency representation, from which we extract discrete “cochlear tokens”.
2️⃣ AuriStream: an autoregressive model over the cochlear tokens.

August 19, 2025 at 1:12 AM

Greta Tuckute

@gretatuckute.bsky.social

Humans largely learn language through speech. In contrast, most LLMs learn from pre-tokenized text.

In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech.

Poster P6, tomorrow (Aug 19) at 1:30 pm, Foyer 2.2!

August 19, 2025 at 1:12 AM

Greta Tuckute

@gretatuckute.bsky.social

Can't wait for #CCN2025! Drop by to say hi to me / collaborators!

August 10, 2025 at 4:52 PM

Greta Tuckute

@gretatuckute.bsky.social

Most voxels within the five frontal and temporal language areas exhibit preference for “hard-to-process/abstract” sentences.

...however, finer-grain substructure exists within and across areas: for instance, the temporal areas are more tuned to abstract meanings compared to the frontal ones.

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

3️⃣ How are the components spatially organized?

We analyzed the voxel-wise Sentence PC weights and found that the Sentence PCs are systematically distributed across the lateral and ventral surface, forming a large-scale topography.

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

2️⃣ What characterizes the two Sentence PCs?

We collected a set of linguistic/semantic features and ran targeted experiments to characterize the Sentence PCs. Based on these analyses, our candidate explanations are that PC 1 corresponds to processing difficulty and PC 2 to meaning abstractness.

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

These two Sentence PCs show high predictivity in or proximal to frontal and temporal language areas—with one notable exception: predictivity in the left ventral stream, a part of the brain primarily associated with high-level vision (in this experiment, participants *listened* to the sentences).

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

Using a stringent leave-one-participant-out cross-validation scheme, we find that two components (denoted as "Sentence PCs") successfully generalize across participants, together accounting for ~32% of the explainable variance in brain responses to sentences.

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

1️⃣ How many components of language are shared across individuals?

We applied decomposition methods to 7T fMRI data from 8 participants listening to 200 diverse sentences. The resulting components—"Sentence PCs"—indicate how much each sentence drives variance along a given direction in voxel space.

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

What are the organizing dimensions of language processing?

We show that voxel responses during comprehension are organized along 2 main axes: processing difficulty & meaning abstractness—revealing an interpretable, topographic representational basis for language processing shared across individuals

May 23, 2025 at 5:00 PM

Greta Tuckute

@gretatuckute.bsky.social

Most voxels within the five frontal and temporal language areas exhibit preference for “hard-to-process/abstract” sentences.

...however, finer-grain substructure exists within and across areas: for instance, the temporal brain areas are more tuned to abstract meanings compared to the frontal ones.

May 23, 2025 at 2:11 PM

Greta Tuckute

@gretatuckute.bsky.social

3️⃣ How are the components spatially organized?

We analyzed the voxel-wise Sentence PC weights and found that the Sentence PCs are systematically distributed across the lateral and ventral surface, forming a large-scale topography.

May 23, 2025 at 2:11 PM

Greta Tuckute

@gretatuckute.bsky.social

2️⃣ What characterizes the two Sentence PCs?

We collected a set of linguistic/semantic features and ran targeted experiments to characterize the Sentence PCs. Based on these analyses, our candidate explanations are that PC 1 corresponds to processing difficulty and PC 2 to meaning abstractness.

May 23, 2025 at 2:11 PM

Greta Tuckute

@gretatuckute.bsky.social

Using a stringent leave-one-participant-out cross-validation scheme, we find that two Sentence PCs successfully generalize across participants, together accounting for ~32% of the explainable variance in brain responses to sentences.

May 23, 2025 at 2:11 PM

Greta Tuckute

@gretatuckute.bsky.social

1️⃣ How many components of language are shared across individuals?

We applied decomposition methods to 7T fMRI data from 8 participants listening to 200 diverse sentences. The resulting components—"Sentence PCs"—indicate how much each sentence drives variance along a given direction in voxel space.

May 23, 2025 at 2:11 PM

Greta Tuckute

@gretatuckute.bsky.social

3️⃣ Along with additional properties that have previously been hypothesized to affect word memorability (word frequency, imageability, and familiarity), we predicted word memorability up to the participant-wise noise ceiling – but the simple account with just synonyms/meanings get you most of the way!

April 10, 2025 at 2:38 PM

Greta Tuckute

@gretatuckute.bsky.social

2️⃣ Critically, most memorable words have a 1:1 relationship with their meaning (e.g. PINEAPPLE). They uniquely pick out a particular meaning in semantic memory, in contrast to ambiguous words (LIGHT) or words with many synonyms (HAPPY). Number of synonyms is a more important predictor.

April 10, 2025 at 2:38 PM

Greta Tuckute

@gretatuckute.bsky.social

We evaluate our account in two behavioral expts (each >600 participants and 2,222 target words). In a paradigm similar to past work on image memorability, participants viewed a sequence of words and pressed a button whenever they encountered a repeat (critical repeats occurred 91-109 words apart).

April 10, 2025 at 2:38 PM

Greta Tuckute

@gretatuckute.bsky.social

Building on past work that suggested that words are encoded by their meanings, we hypothesize that words that uniquely pick out a meaning in semantic memory (i.e., unambiguous words with no/few synonyms) are more memorable.

April 10, 2025 at 2:38 PM

Greta Tuckute

@gretatuckute.bsky.social

Ablation of LLM "language network" units (dark blue) versus random units (light blue): bsky.app/profile/bkhm...

Moreover, this table shows the effect of ablations on next word prediction for a few sample models:

December 19, 2024 at 4:01 PM

Greta Tuckute

@gretatuckute.bsky.social

I defended my PhD at MIT Brain&Cog last week--so much gratitude to my advisor @evfedorenko.bsky.social, as well as my committee @nancykanwisher.bsky.social, @joshhmcdermott.bsky.social and Yoon Kim. Thank you to all my brilliant collaborators and the MIT community. I have loved this journey so much.

December 15, 2024 at 3:13 PM

Greta Tuckute

@gretatuckute.bsky.social

8/
Highlighting one key finding:

Surprising sentences with unusual grammar and/or meaning elicit the highest activity in the language network.

In other words, the language network responds strongly to sentences that are “normal” enough to engage it, but unusual enough to tax it.

January 4, 2024 at 12:01 AM

Greta Tuckute

@gretatuckute.bsky.social

6/
How accurate were the predictions at single-sentence level? r=0.43 (predictivity performance for new stimuli AND participants).
The model predictions captured most (69%) of the explainable variance (variance not attributed to inter-participant variability / measurement noise).

January 4, 2024 at 12:00 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news