Christoph Minixhofer
cdminix.bsky.social
Christoph Minixhofer
@cdminix.bsky.social
Post-doc @ University of Edinburgh. Working on Synthetic Speech Evaluation at the moment.
🇳🇴 Oslo 🏴󠁧󠁢󠁳󠁣󠁴󠁿 Edinburgh 🇦🇹 Graz
I don't download new HF models often, but when I do, it's during the 0.008% of downtime :(
October 20, 2025 at 9:04 AM
It's been a great #interspeech2025!
I presented a TTS-for-ASR paper:
www.isca-archive.org/interspeech_...
And one on prosody reps: www.isca-archive.org/interspeech_...
There were many interesting questions & comments - if you have more and didn't get the chance feel free to send me a message.
August 21, 2025 at 4:47 PM
Thank you to everyone who stopped by, I’m grateful for all the feedback and interesting questions #interspeech2025
August 20, 2025 at 12:42 PM
One day until the Q2 ttsdsbenchmark.com update. We‘ll see which TTS system tops the leaderboard this time - some new ones have been added that could shake things up.
July 4, 2025 at 6:29 AM
This figure motivated a lot of my PhD (or at least nudged me into a direction) -- check out arxiv.org/abs/2110.11479 (Hu et al.) if you haven't come across it before, it really frames the problem of synthetic/real speech distributions well.
June 30, 2025 at 6:40 PM
Spotted a Norwegian flag across the Firth of Forth, didn’t know Norwegians had hytte on this side of the North Sea as well!
June 29, 2025 at 12:42 PM
When future archeologists dig up the remains of my thesis in 3,000 years.
May 13, 2025 at 4:56 PM
I’m told it is mandatory in Norway to leave the city and go to a hytte in thewoods on the weekend, so doing my best.
December 14, 2024 at 1:55 PM
Nice, good to know. Do you mean what happens to the reprs after fine-tuning? I'd guess the more different the downstream task the bigger a jump you'd see in the last layer(s). It's already visible in the paper I linked (phone identity and word identity) - although idk why word meaning improved!
December 12, 2024 at 7:15 PM
Yes, the energy requirements (especially for training) are not transparent enough and a lot of AI use is frivolous. At the moment a ChatGPT query takes about 15x the energy of a g. search. Yet no one is telling me to go to the library and read through conference proceedings to avoid 15+ g. searches.
December 11, 2024 at 10:31 AM
So if we look at google scholar results for both, it looks like SMOS is on the rise, but it has actually been used at least as long as CMOS for speech synthesis evaluation.
CMOS has a history in evaluation standards, just like MOS. But recently it's all about speech synth.

(7/9)
December 10, 2024 at 9:37 AM
Presented my poster on TTSDS, a benchmark for Text-to-Speech at #slt2024 yesterday.

We found that our zero-shot distribution distance (similar to FID across several factors like prosody, speaker, etc.) correlated well with subjective evaluation for TTS systems from 2008 to 2024.

ttsdsbenchmark.com
December 4, 2024 at 6:48 AM
If people had cheered for Elon at that Dave Chapelle gig ages ago, could we have avoided this entire timeline?
November 21, 2024 at 6:15 AM
As part of some ongoing work, I'm releasing the currently biggest collection of docker containers for state-of-the-art #voicecloning #tts systems. github.com/ttsds/datasets
Alongside there is also a nice overview of all systems (see below)
November 19, 2024 at 11:19 AM