#diarization
Evtl. hab ich auch die Audiostreams der DASH sources der jeweiligen Sendung zu rein privatem Genuß auf Kassetten sicherheitskopiert.

Man könnte per diarization ein durchsuchbares Archiv aller Politikerinneninterviews machen...
December 17, 2024 at 5:34 PM
In case you were wondering, even some of the way better (at least that I can afford) speech-to-text + post-transcript analysis tools (entities/topics/sentiment/etc) cannot handle an audio-file of a four-person panel live-watching a debate. Diarization fails almost immediately (and rly gets […]
Original post on mastodon.social
mastodon.social
September 11, 2024 at 11:21 AM
Amazon Bedrock Data Automation now provides support for enhancing transcriptions

Amazon Bedrock Data Automation now supports enhanced audio transcription with speaker diarization and channel identification, enabling separate processing of multi-party conversations. Available in 7 AWS regions.
October 1, 2025 at 5:09 PM
Lian Remme, Kevin Tang
Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
https://arxiv.org/abs/2502.12714
February 19, 2025 at 9:16 AM
A spatio-spectral diarization pipeline combines TDOA-based segmentation and embedding-based clustering, outperforming single-channel methods and tracking speakers when they move.
Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering
Tobias Cord-Landwehr, Tobias Gburrek, Marc Deegen, Reinhold Haeb-Umbach
arxiv.org
June 23, 2025 at 8:46 AM
Diarization is hard.
February 17, 2025 at 8:05 AM
Gao, Wu, Chen, Du, Lee, Watanabe, Chen, Marco, Scharenborg: The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition https://arxiv.org/abs/2505.13971 https://arxiv.org/pdf/2505.13971 https://arxiv.org/html/2505.13971
May 21, 2025 at 6:00 AM
results show that denoising significantly improves the Diarization Error Rate (DER) by reducing the rate of missed speech. Additionally, training on both denoised and noisy datasets leads to substantial performance gains in noisy conditions. The [5/7 of https://arxiv.org/abs/2505.10879v1]
May 19, 2025 at 5:59 AM
Were you able to get the diarization in R? I know it’s straightforward in Python, but I’ve never even done so much as a “Hello, World” in Python, so I’d like to stick to R if I can. I can’t find a way to do it though.
April 17, 2024 at 10:38 PM
The model, 'pyannote/speaker-diarization', runs on the CPU as standard so I just spent a morning figuring out how to move it to the GPU. I was expecting an improvement, but not this!
November 7, 2024 at 12:47 PM
playing with speaker diarization/segmentation. Any suggestion on good readings and libraries?
November 14, 2024 at 10:17 AM
Playing with PyCasp (https://github.com/egonina/pycasp/wiki) for speaker diarization. Any tip out there?
Build software better, together
GitHub is where people build software. More than 100 mill...
github.com
November 14, 2024 at 9:05 AM
arXiv:2505.16387v1 Announce Type: new
Abstract: This paper describes the speaker diarization system developed for the Multimodal Information-Based Speech Processing (MISP) 2025 Challenge. First, we utilize the Sequence-to-Sequence Neural Diarization [1/3 of https://arxiv.org/abs/2505.16387v1]
May 23, 2025 at 6:01 AM
j'avais été bluffé aussi par la diarization 🤩

Le voici : dev.to/zenika/rendr...
Rendre son podcast accessible avec l'IA au service de la transcription
🇬🇧 Une version anglaise est désormais disponible :...
dev.to
August 28, 2025 at 12:12 PM
GitHub - QuentinFuxa/WhisperLiveKit: Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface
https://github.com/QuentinFuxa/WhisperLiveKit
GitHub - QuentinFuxa/WhisperLiveKit: Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface
Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface - QuentinFuxa/WhisperLiveKit
github.com
August 28, 2025 at 1:29 AM
EEND-TA achieved a DER of 14.49% on DIHARD III, with fast non‑autoregressive inference that processes recordings in parallel; the model was presented at Interspeech 2025. https://getnews.me/new-state-of-the-art-results-for-end-to-end-speaker-diarization/ #speakerdiarisation #eendta
September 19, 2025 at 10:21 PM
Backbone: Omnilingual w2v 2.0 (7B).
A multilingual speech representation you can fine-tune for ASR or repurpose for tasks like diarization, keyword spotting, or alignment.
November 11, 2025 at 2:35 PM
Analysis of End-to-End Neural Diarization reveals that finetuned WavLM-based encoder achieves best performance, LSTM decoder is outclassed, and multiclass loss is generally superior; newer architectures handle longer chunks.
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, Shoko Araki, Hervé Bredin
arxiv.org
June 16, 2025 at 10:43 AM
arXiv:2505.24111v1 Announce Type: new
Abstract: Self-supervised learning (SSL) models like WavLM can be effectively utilized when building speaker diarization systems but are often large and slow, limiting their use in resource constrained scenarios. [1/4 of https://arxiv.org/abs/2505.24111v1]
June 2, 2025 at 6:01 AM
📰 Show HN: Python Audio Transcription: Convert Speech to Text Locally

💬 Hacker News community praises audio-to-text projects & shares tools—support for diarization debated. 📈

https://news.ycombinator.com/item?id=45337400
September 22, 2025 at 7:30 PM
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network. (arXiv:2309.08489v1 [eess.AS])
http://arxiv.org/abs/2309.08489
September 19, 2023 at 12:00 AM
If you need real-time voice transcription and you can live without speaker diarization + English-only, you will be hard-pressed to beat AssemblyAI's new "Universal Streaming" API.

Incredible latency & accuracy *and* only $0.15p/h. Insane.

www.assemblyai.com/blog/introdu...
Speech-to-Text for voice agents - Universal-Streaming
Universal-Streaming delivers the streaming speech-to-text voice agents have been missing: fast immutable transcripts, higher accuracy, built-in endpointing, and pricing that scales with you.
www.assemblyai.com
July 25, 2025 at 11:17 AM
Xinlu He, Yiwen Guan, Badrivishal Paurana, Zilin Dai, Jacob Whitehill: Interactive Real-Time Speaker Diarization Correction with Human Feedback https://arxiv.org/abs/2509.18377 https://arxiv.org/pdf/2509.18377 https://arxiv.org/html/2509.18377
September 24, 2025 at 6:30 AM
OWhisper, leveraging Ollama, is a new tool for real-time speech-to-text. The Hacker News discussion explores its features, potential uses, and future, with focus on streaming, diarization, and API integration. #OWhisper 1/6
August 16, 2025 at 1:00 AM