www.tuckute.com
AuriStream shows that causal prediction over short audio chunks (cochlear tokens) is enough to generate meaningful sentence continuations!
AuriStream shows that causal prediction over short audio chunks (cochlear tokens) is enough to generate meaningful sentence continuations!
1️⃣ WavCoch: a small model that transforms raw audio into a cochlea-like time-frequency representation, from which we extract discrete “cochlear tokens”.
2️⃣ AuriStream: an autoregressive model over the cochlear tokens.
1️⃣ WavCoch: a small model that transforms raw audio into a cochlea-like time-frequency representation, from which we extract discrete “cochlear tokens”.
2️⃣ AuriStream: an autoregressive model over the cochlear tokens.
In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech.
Poster P6, tomorrow (Aug 19) at 1:30 pm, Foyer 2.2!
In our #Interspeech2025 paper, we introduce AuriStream: a simple, causal model that learns phoneme, word & semantic information from speech.
Poster P6, tomorrow (Aug 19) at 1:30 pm, Foyer 2.2!
...however, finer-grain substructure exists within and across areas: for instance, the temporal areas are more tuned to abstract meanings compared to the frontal ones.
...however, finer-grain substructure exists within and across areas: for instance, the temporal areas are more tuned to abstract meanings compared to the frontal ones.
We analyzed the voxel-wise Sentence PC weights and found that the Sentence PCs are systematically distributed across the lateral and ventral surface, forming a large-scale topography.
We analyzed the voxel-wise Sentence PC weights and found that the Sentence PCs are systematically distributed across the lateral and ventral surface, forming a large-scale topography.
We collected a set of linguistic/semantic features and ran targeted experiments to characterize the Sentence PCs. Based on these analyses, our candidate explanations are that PC 1 corresponds to processing difficulty and PC 2 to meaning abstractness.
We collected a set of linguistic/semantic features and ran targeted experiments to characterize the Sentence PCs. Based on these analyses, our candidate explanations are that PC 1 corresponds to processing difficulty and PC 2 to meaning abstractness.
We applied decomposition methods to 7T fMRI data from 8 participants listening to 200 diverse sentences. The resulting components—"Sentence PCs"—indicate how much each sentence drives variance along a given direction in voxel space.
We applied decomposition methods to 7T fMRI data from 8 participants listening to 200 diverse sentences. The resulting components—"Sentence PCs"—indicate how much each sentence drives variance along a given direction in voxel space.
We show that voxel responses during comprehension are organized along 2 main axes: processing difficulty & meaning abstractness—revealing an interpretable, topographic representational basis for language processing shared across individuals
We show that voxel responses during comprehension are organized along 2 main axes: processing difficulty & meaning abstractness—revealing an interpretable, topographic representational basis for language processing shared across individuals
...however, finer-grain substructure exists within and across areas: for instance, the temporal brain areas are more tuned to abstract meanings compared to the frontal ones.
...however, finer-grain substructure exists within and across areas: for instance, the temporal brain areas are more tuned to abstract meanings compared to the frontal ones.
We analyzed the voxel-wise Sentence PC weights and found that the Sentence PCs are systematically distributed across the lateral and ventral surface, forming a large-scale topography.
We analyzed the voxel-wise Sentence PC weights and found that the Sentence PCs are systematically distributed across the lateral and ventral surface, forming a large-scale topography.
We collected a set of linguistic/semantic features and ran targeted experiments to characterize the Sentence PCs. Based on these analyses, our candidate explanations are that PC 1 corresponds to processing difficulty and PC 2 to meaning abstractness.
We collected a set of linguistic/semantic features and ran targeted experiments to characterize the Sentence PCs. Based on these analyses, our candidate explanations are that PC 1 corresponds to processing difficulty and PC 2 to meaning abstractness.
We applied decomposition methods to 7T fMRI data from 8 participants listening to 200 diverse sentences. The resulting components—"Sentence PCs"—indicate how much each sentence drives variance along a given direction in voxel space.
We applied decomposition methods to 7T fMRI data from 8 participants listening to 200 diverse sentences. The resulting components—"Sentence PCs"—indicate how much each sentence drives variance along a given direction in voxel space.
Moreover, this table shows the effect of ablations on next word prediction for a few sample models:
Moreover, this table shows the effect of ablations on next word prediction for a few sample models:
Highlighting one key finding:
Surprising sentences with unusual grammar and/or meaning elicit the highest activity in the language network.
In other words, the language network responds strongly to sentences that are “normal” enough to engage it, but unusual enough to tax it.
Highlighting one key finding:
Surprising sentences with unusual grammar and/or meaning elicit the highest activity in the language network.
In other words, the language network responds strongly to sentences that are “normal” enough to engage it, but unusual enough to tax it.
How accurate were the predictions at single-sentence level? r=0.43 (predictivity performance for new stimuli AND participants).
The model predictions captured most (69%) of the explainable variance (variance not attributed to inter-participant variability / measurement noise).
How accurate were the predictions at single-sentence level? r=0.43 (predictivity performance for new stimuli AND participants).
The model predictions captured most (69%) of the explainable variance (variance not attributed to inter-participant variability / measurement noise).