Lightnews — Scholar-powered news

Mario Zechner

@mariozechner.at

Evtl. hab ich auch die Audiostreams der DASH sources der jeweiligen Sendung zu rein privatem Genuß auf Kassetten sicherheitskopiert.

Man könnte per diarization ein durchsuchbares Archiv aller Politikerinneninterviews machen...

December 17, 2024 at 5:34 PM

hrbrmstr 🇺🇦 🇬🇱 🇨🇦 🏳️‍🌈

@hrbrmstr.mastodon.social.ap.brid.gy

In case you were wondering, even some of the way better (at least that I can afford) speech-to-text + post-transcript analysis tools (entities/topics/sentiment/etc) cannot handle an audio-file of a four-person panel live-watching a debate. Diarization fails almost immediately (and rly gets […]

Original post on mastodon.social

mastodon.social

September 11, 2024 at 11:21 AM

🆃op🅽ews EN 🍋‍🟩 Verified. Not Amplified.

@toppnews.bsky.social

🌐 Smart AI Transcriptions

#transcription #gdpr #speakerrecognition #fast&fair #diarization #ncaa #multilingual

💬 Perfect for interviews, meetings & podcasts!

explicare.de

March 26, 2025 at 8:44 PM

AWS What's New Skeetbot

@aws-skeetbot.lastweekinaws.com

Amazon Bedrock Data Automation now provides support for enhancing transcriptions

Amazon Bedrock Data Automation now supports enhanced audio transcription with speaker diarization and channel identification, enabling separate processing of multi-party conversations. Available in 7 AWS regions.

October 1, 2025 at 5:09 PM

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Lian Remme, Kevin Tang
Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
https://arxiv.org/abs/2502.12714

February 19, 2025 at 9:16 AM

arXiv Sound

@arxiv-sound.bsky.social

A spatio-spectral diarization pipeline combines TDOA-based segmentation and embedding-based clustering, outperforming single-channel methods and tracking speakers when they move.

Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering

Tobias Cord-Landwehr, Tobias Gburrek, Marc Deegen, Reinhold Haeb-Umbach

arxiv.org

June 23, 2025 at 8:46 AM

Erik Rasmussen

@erikras.com

Diarization is hard.

February 17, 2025 at 8:05 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Gao, Wu, Chen, Du, Lee, Watanabe, Chen, Marco, Scharenborg: The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition https://arxiv.org/abs/2505.13971 https://arxiv.org/pdf/2505.13971 https://arxiv.org/html/2505.13971

May 21, 2025 at 6:00 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

results show that denoising significantly improves the Diarization Error Rate (DER) by reducing the rate of missed speech. Additionally, training on both denoised and noisy datasets leads to substantial performance gains in noisy conditions. The [5/7 of https://arxiv.org/abs/2505.10879v1]

May 19, 2025 at 5:59 AM

Joey Stanley

@joeystanley.com

Were you able to get the diarization in R? I know it’s straightforward in Python, but I’ve never even done so much as a “Hello, World” in Python, so I’d like to stick to R if I can. I can’t find a way to do it though.

April 17, 2024 at 10:38 PM

Stacy Cashmore

@stacy-clouds.net

The model, 'pyannote/speaker-diarization', runs on the CPU as standard so I just spent a morning figuring out how to move it to the GPU. I was expecting an improvement, but not this!

November 7, 2024 at 12:47 PM

Riccardo Fusaroli

@fusaroli.bsky.social

playing with speaker diarization/segmentation. Any suggestion on good readings and libraries?

November 14, 2024 at 10:17 AM

Riccardo Fusaroli

@fusaroli.bsky.social

Playing with PyCasp (https://github.com/egonina/pycasp/wiki) for speaker diarization. Any tip out there?

Build software better, together

GitHub is where people build software. More than 100 mill...

github.com

November 14, 2024 at 9:05 AM

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

arXiv:2505.16387v1 Announce Type: new
Abstract: This paper describes the speaker diarization system developed for the Multimodal Information-Based Speech Processing (MISP) 2025 Challenge. First, we utilize the Sequence-to-Sequence Neural Diarization [1/3 of https://arxiv.org/abs/2505.16387v1]

May 23, 2025 at 6:01 AM

Jean-Phi Baconnais 🦎

@jeanphi-baconnais.gitlab.io

j'avais été bluffé aussi par la diarization 🤩

Le voici : dev.to/zenika/rendr...

Rendre son podcast accessible avec l'IA au service de la transcription

🇬🇧 Une version anglaise est désormais disponible :...

dev.to

August 28, 2025 at 12:12 PM

Tech Trending

@tech-trending.bsky.social

GitHub - QuentinFuxa/WhisperLiveKit: Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface
https://github.com/QuentinFuxa/WhisperLiveKit

GitHub - QuentinFuxa/WhisperLiveKit: Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface

Python package for Real-time, Local Speech-to-Text and Speaker Diarization. FastAPI Server & Web Interface - QuentinFuxa/WhisperLiveKit

github.com

August 28, 2025 at 1:29 AM

GetNews.me

@getnews-me.bsky.social

EEND-TA achieved a DER of 14.49% on DIHARD III, with fast non‑autoregressive inference that processes recordings in parallel; the model was presented at Interspeech 2025. https://getnews.me/new-state-of-the-art-results-for-end-to-end-speaker-diarization/ #speakerdiarisation #eendta

New State‑of‑the‑Art Results for End‑to‑End Speaker Diarization

September 19, 2025 at 10:21 PM

luokai

@luok.ai

Backbone: Omnilingual w2v 2.0 (7B).
A multilingual speech representation you can fine-tune for ASR or repurpose for tasks like diarization, keyword spotting, or alignment.

November 11, 2025 at 2:35 PM

arXiv Sound

@arxiv-sound.bsky.social

Analysis of End-to-End Neural Diarization reveals that finetuned WavLM-based encoder achieves best performance, LSTM decoder is outclassed, and multiclass loss is generally superior; newer architectures handle longer chunks.

Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering

Alexis Plaquet, Naohiro Tawara, Marc Delcroix, Shota Horiguchi, Atsushi Ando, Shoko Araki, Hervé Bredin

arxiv.org

June 16, 2025 at 10:43 AM

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

arXiv:2505.24111v1 Announce Type: new
Abstract: Self-supervised learning (SSL) models like WavLM can be effectively utilized when building speaker diarization systems but are often large and slow, limiting their use in resource constrained scenarios. [1/4 of https://arxiv.org/abs/2505.24111v1]

June 2, 2025 at 6:01 AM

HN Link Bot

@hnews.southla.social

📰 Show HN: Python Audio Transcription: Convert Speech to Text Locally

💬 Hacker News community praises audio-to-text projects & shares tools—support for diarization debated. 📈

https://news.ycombinator.com/item?id=45337400

September 22, 2025 at 7:30 PM

arxiv stat.ML

@arxiv-stat-ml.bsky.social

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network. (arXiv:2309.08489v1 [eess.AS])
http://arxiv.org/abs/2309.08489

September 19, 2023 at 12:00 AM

Nick Payne

@makeusabrew.bsky.social

If you need real-time voice transcription and you can live without speaker diarization + English-only, you will be hard-pressed to beat AssemblyAI's new "Universal Streaming" API.

Incredible latency & accuracy *and* only $0.15p/h. Insane.

www.assemblyai.com/blog/introdu...

Speech-to-Text for voice agents - Universal-Streaming

Universal-Streaming delivers the streaming speech-to-text voice agents have been missing: fast immutable transcripts, higher accuracy, built-in endpointing, and pricing that scales with you.

www.assemblyai.com

July 25, 2025 at 11:17 AM

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Xinlu He, Yiwen Guan, Badrivishal Paurana, Zilin Dai, Jacob Whitehill: Interactive Real-Time Speaker Diarization Correction with Human Feedback https://arxiv.org/abs/2509.18377 https://arxiv.org/pdf/2509.18377 https://arxiv.org/html/2509.18377

September 24, 2025 at 6:30 AM

Hacker News Companion

@hncompanion.com

OWhisper, leveraging Ollama, is a new tool for real-time speech-to-text. The Hacker News discussion explores its features, potential uses, and future, with focus on streaming, diarization, and API integration. #OWhisper 1/6

August 16, 2025 at 1:00 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news