Lightnews — Scholar-powered news

Christoph Minixhofer

@cdminix.bsky.social

100 followers 210 following 120 posts

Post-doc @ University of Edinburgh. Working on Synthetic Speech Evaluation at the moment.
🇳🇴 Oslo 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Edinburgh 🇦🇹 Graz

Posts Replies Media Videos

Pinned

Christoph Minixhofer @cdminix.bsky.social · 16d

🧪 My paper on Text-to-Speech evaluation using distributional measures has been accepted to ICLR 2026! 🎉
openreview.net/forum?id=uGa...
In my opinion, we should focus much more on the distributions of synthetically generated speech, and we showed this correlates highly with human ratings.

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text...

Evaluation of Text to Speech (TTS) systems is challenging and resource-intensive. Subjective metrics such as Mean Opinion Score (MOS) are not easily comparable between works. Objective metrics are...

openreview.net

Christoph Minixhofer

@cdminix.bsky.social

Currently on three different papers using BWS with three different methods to run the listening tests due to different Universities/first authors - it’s time we had an open-source framework for listening tests that is well maintained and easy to use. If you know any let me know!

February 10, 2026 at 8:21 PM

Christoph Minixhofer

@cdminix.bsky.social

A pre-release of *ttsdb*, my collection of SOTA TTS models, is out now - github.com/ttsds/ttsdb

The aim is to provide a simple cli and collection of python packages to make it easy to synthesise speech across a variety of models. Docs and website coming soon!

GitHub - ttsds/ttsdb: A database for modern, open-source TTS systems.

A database for modern, open-source TTS systems. Contribute to ttsds/ttsdb development by creating an account on GitHub.

github.com

February 4, 2026 at 6:25 AM

Christoph Minixhofer

@cdminix.bsky.social

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text...

Evaluation of Text to Speech (TTS) systems is challenging and resource-intensive. Subjective metrics such as Mean Opinion Score (MOS) are not easily comparable between works. Objective metrics are...

openreview.net

January 26, 2026 at 3:02 PM

Christoph Minixhofer

@cdminix.bsky.social

Just came across this wonderful blogpost on pangrams in many languages. If only there was a similar collection full of phonetic pangrams!
clagnut.com/blog/2380/#P...

List of pangrams

There used to be a page on Wikipedia listing pangrams in various languages. This was deleted yesterday. Pangrams can be occasioanlly useful for designers, so I’ve resurrected the page of here, pretty ...

clagnut.com

January 25, 2026 at 1:10 PM

Reposted by Christoph Minixhofer

Alice Ross

@alice-ross.bsky.social

www.technomoralfutures.uk/news-databas...
Happy Monday! Here's me thinking about speech tech, voices, and death thanks to the lovely @technomoralfutures.bsky.social

content notes: discussion of death, grief, online abuse

Text-to-speech voices as human remains — Centre for Technomoral Futures

Speech technology researchers worldwide are working on improving the smoothness and fidelity of text-to-speech towards the goal of accessible communication for all. However, TTS models are also being ...

www.technomoralfutures.uk

November 24, 2025 at 12:24 PM

Christoph Minixhofer

@cdminix.bsky.social

Passed my viva yesterday 🥳
Here's the pre-viva talk if anyone's interested, my work was/is about quantifying the distributional distance between real and synthetic speech.
youtu.be/Ii-6buwAoCg

Quantifying the Distributional Distance between Synthetic and Real Speech (Pre-Viva Talk)

YouTube video by Christoph Minixhofer

youtu.be

November 19, 2025 at 11:18 AM

Christoph Minixhofer

@cdminix.bsky.social

First time going to a big gym in the UK, and somehow the practice of saying a little “sorry” as you go past someone cracks me up in that setting.

November 11, 2025 at 10:23 AM

Reposted by Christoph Minixhofer

Timo B. Roettger

@timoroettger.bsky.social

Fill in the blank:

"My p-value is smaller than 0.05, so..."

Wrong answers only.

November 4, 2025 at 12:24 PM

Christoph Minixhofer

@cdminix.bsky.social

I don't download new HF models often, but when I do, it's during the 0.008% of downtime :(

October 20, 2025 at 9:04 AM

Christoph Minixhofer

@cdminix.bsky.social

TTSDS2 is one of the papers accepted by the @neuripsconf.bsky.social area chairs but but rejected by the senior area chairs with no explanation as to why. A bit frustrating after the long review process.

September 20, 2025 at 8:28 AM

Christoph Minixhofer

@cdminix.bsky.social

Accents are also best seen as a distribution, not a group of labels imo. We tried to incorporate some proxy of accent in TTSDS2, but a simple phone distribution did not work all that well, probably because it’s hard to disentangle from lexical content…

Nina Markl 🏳️‍⚧️🏳️‍🌈 @ninamarkl.bsky.social · Aug 22

in honour of interspeech this week, i’d like to issue a reminder that everyone has an accent, and that’s beautiful, actually: www.isca-archive.org/interspeech_...

www.isca-archive.org/interspeech_...

August 24, 2025 at 4:41 PM

Christoph Minixhofer

@cdminix.bsky.social

It's been a great #interspeech2025!
I presented a TTS-for-ASR paper:
www.isca-archive.org/interspeech_...
And one on prosody reps: www.isca-archive.org/interspeech_...
There were many interesting questions & comments - if you have more and didn't get the chance feel free to send me a message.

August 21, 2025 at 4:47 PM

Christoph Minixhofer

@cdminix.bsky.social

I’ll will be presenting this tomorrow at 8.50 at #interspeech2025, come by if you’re interested in prosodic representations!

Christoph Minixhofer @cdminix.bsky.social · Jun 12

🧪 SSL (self-supervised learning) models can produce very useful speech representations, but what if we limit their input to prosodic correlates (Pitch, Energy, Voice Activity)? Sarenne Wallbridge and I explored what these representations do (and don’t) encode: arxiv.org/abs/2506.02584 1/2

Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning

People exploit the predictability of lexical structures during text comprehension. Though predictable structure is also present in speech, the degree to which prosody, e.g. intonation, tempo, and loud...

arxiv.org

August 20, 2025 at 8:48 PM

Christoph Minixhofer

@cdminix.bsky.social

In other news — if you’re an early bird and at #interspeech, feel free to drop by my poster presentation on scaling synthetic data tomorrow - who doesn’t want to chat about neural scaling laws early in the morning!
App: interspeech.app.link?event=687602...
Paper: www.isca-archive.org/interspeech_...

August 19, 2025 at 9:24 PM

Christoph Minixhofer

@cdminix.bsky.social

A highlight at #interspeech so far: the “hear me out” show&tell in which you can check how the spoken language model Moshi responds based on if it’s your voice or a voice converted version to the opposite gender.
Check it out here shreeharsha-bs.github.io/Hear-Me-Out/
1/2

Hear Me Out

Interactive evaluation and bias discovery platform for speech-to-speech conversational AI

shreeharsha-bs.github.io

August 19, 2025 at 9:14 PM

Christoph Minixhofer

@cdminix.bsky.social

If you’re interested in ASR for low resource languages, come by at 14.30 in Poster Area 09 at #interspeech today! I’ll be presenting this paper by Ondrej Klejch et al. arxiv.org/abs/2506.04915

A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic

An effective approach to the development of ASR systems for low-resource languages is to fine-tune an existing multilingual end-to-end model. When the original model has been trained on large quantiti...

arxiv.org

August 18, 2025 at 9:59 AM

Christoph Minixhofer

@cdminix.bsky.social

Looking forward to present a bunch of things at #INTERSPEECH and #SSW - will put the details here once my thesis final draft is done, which will probably be on the plane to Rotterdam.

August 11, 2025 at 9:34 PM

Christoph Minixhofer

@cdminix.bsky.social

One day until the Q2 ttsdsbenchmark.com update. We‘ll see which TTS system tops the leaderboard this time - some new ones have been added that could shake things up.

July 4, 2025 at 6:29 AM

Christoph Minixhofer

@cdminix.bsky.social

Followed your advice and can confirm “Ughaaaghaghaa” was my reaction as well.

Erika Ishii @erikaishii.bsky.social · Jun 28

You ever watch a film and just know it’s a seminal medium-defining work of peak interdisciplinary storytelling all you can say in the moment is “Ughaaaghaghaa” and then cry?

So yeah everyone watch K-Pop Demon Hunters.

July 2, 2025 at 11:29 AM

Christoph Minixhofer

@cdminix.bsky.social

This figure motivated a lot of my PhD (or at least nudged me into a direction) -- check out arxiv.org/abs/2110.11479 (Hu et al.) if you haven't come across it before, it really frames the problem of synthetic/real speech distributions well.

Figure showing two overlapping bell curves representing data distributions. The green curve on the left is labeled ‘synthetic data distribution’, and the black curve on the right is labeled ‘true data distribution’. The horizontal axis is divided into four regions: ‘artifacts’ (only covered by the green curve), ‘over-sampled’ (where the synthetic curve is higher than true), ‘under-sampled’ (where the true curve is higher than synthetic), and ‘missing samples’ (only covered by the black curve). Caption: Fig. 1 describes the gap between synthetic and true data distributions partitioned into four regions.

June 30, 2025 at 6:40 PM

Christoph Minixhofer

@cdminix.bsky.social

Spotted a Norwegian flag across the Firth of Forth, didn’t know Norwegians had hytte on this side of the North Sea as well!

Norwegian flag in a sunny and green scene in Scotland with water and a bridge in the background.

June 29, 2025 at 12:42 PM

Christoph Minixhofer

@cdminix.bsky.social

More details on this soon! Also this weekend is the last chance to submit your TTS system for the next round of evaluation (Q2 2025) by either messaging me at christoph.minixhofer@ed.ac.uk or requesting a model here: huggingface.co/spaces/ttsds...

arXiv cs.SD Sound @cssd-bot.bsky.social · Jun 25

Christoph Minixhofer, Ondrej Klejch, Peter Bell: TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems https://arxiv.org/abs/2506.19441 https://arxiv.org/pdf/2506.19441 https://arxiv.org/html/2506.19441

June 27, 2025 at 8:09 AM

Christoph Minixhofer

@cdminix.bsky.social

It’s amazing how a days work can stretch out over a fortnight, and a week of work can be compressed into 24 hours sometimes…

June 27, 2025 at 2:53 AM

Christoph Minixhofer

@cdminix.bsky.social

I wonder if there are naturally left-curling and right-curling cats, or if all cats curl both ways.

Unseen Japan @unseenjapan.com · Jun 25

Not Japan-related, but since we all need a distraction from The Horrors, Takaya Suzuki points out a study that examined 408 sleeping cats and found the majority (65%) curl leftwards.

I'm not sure how useful this information is, but...it's yours now.

Post by suzuki_takaya

衝撃の事実。ネコは１日12−16時間寝るけれども、この丸まったポジションの左巻き・右巻きのどちらが多いか。YouTubeで何匹もの眠るネコを観察し、266匹（65％）のネコが左巻き、142匹のネコが右巻きに寝ていたことを発見。有意な差が見られたとのこと。人類史上に残る幸せな研究といえる。

June 25, 2025 at 11:57 PM

Christoph Minixhofer

@cdminix.bsky.social

I’ve only really encountered people trying to avoid sounding like AI… but it makes sense that it would alter how people speak if they interact with it a lot. Makes me sad though since it pushes people towards the mean, which is always the most boring.

Eryk Salvaggio @eryk.bsky.social · Jun 22

It’s funny that “nuance” is one of the words LLMs tend to avoid www.theverge.com/openai/68674...

You sound like ChatGPT

AI isn’t just impacting how we write — it’s changing how we speak and interact with others. And there’s only more to come.

www.theverge.com

June 23, 2025 at 9:22 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news