Lightnews — Scholar-powered news

Dominik Klement

@dklement.bsky.social

Speech Researcher @ BUT SPEECH
Visiting student @ CLSP Johns Hopkins University

GitHub: https://github.com/domklement
LinkedIN: https://www.linkedin.com/in/dominik-klement/

Posts Replies Media Videos

Dominik Klement

@dklement.bsky.social

🤝 Collaboration and Feedback Welcome
We’re open to feedback, discussions, and collaborations. Let’s work together to shape the future of ASR and diarization technology!

[14/14]

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟 Kudos to CHiME-8 NOTSOFAR-1 Organizers
Thanks to Alon Vinnikov, Amir Ivry, Eyal Krupka (Microsoft) for organizing the CHiME-8 NOTSOFAR-1 Challenge, and to the CHiME-8 Steering Committee for their dedication to advancing speech recognition research!

[13/14]

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

💻Gradio-powered Demo pccnect.fit.vutbr.cz/gradio-demo - Test our DiCoW model to transcribe your own meetings! The demo is live for 72 hours only, so don’t miss this chance.
[12/14]

Gradio

pccnect.fit.vutbr.cz

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🔗DiCoW Inference Demo Pipeline github.com/BUTSpeechFIT...
[11/14]

GitHub - BUTSpeechFIT/DiCoW

Contribute to BUTSpeechFIT/DiCoW development by creating an account on GitHub.

github.com

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🔗Target-Speaker Whisper Source Code github.com/BUTSpeechFIT...
[10/14]

GitHub - BUTSpeechFIT/TS-ASR-Whisper

Contribute to BUTSpeechFIT/TS-ASR-Whisper development by creating an account on GitHub.

github.com

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟Open-Source Tools and Demos
We’re making our research accessible by open-sourcing training and inference codebases, and providing interactive demos:
🔗DiariZen Source Code github.com/BUTSpeechFIT...

[9/14]

GitHub - BUTSpeechFIT/DiariZen: A toolkit for speaker diarization.

A toolkit for speaker diarization. . Contribute to BUTSpeechFIT/DiariZen development by creating an account on GitHub.

github.com

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟
4. Leveraging Self-Supervised Learning for Speaker Diarization - Accepted to ICASSP 2025. This paper introduces DiariZen - our state-of-the-art diarization model and toolkit.
arxiv.org/abs/2409.09408
[8/14]

Leveraging Self-Supervised Learning for Speaker Diarization

End-to-end neural diarization has evolved considerably over the past few years, but data scarcity is still a major obstacle for further improvements. Self-supervised learning methods such as WavLM hav...

arxiv.org

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟
3. BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge - The work earned the 🏆Jury Prize for being one of the most practical, efficient, and novel systems. Our robust diarization-ASR integration is capable of tackling overlapped speech.
isca-archive.org/chime_2024/p...

[7/14]

ISCA Archive - BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge

isca-archive.org

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟
2. Target Speaker ASR with Whisper arxiv.org/abs/2409.09543 - Accepted to ICASSP 2025. This work enhances the Whisper ASR model for target-speaker recognition, demonstrating its applicability in complex acoustic scenarios.
[6/14]

Target Speaker ASR with Whisper

We propose a novel approach to enable the use of large, single speaker ASR models, such as Whisper, for target speaker ASR. The key insight of this method is that it is much easier to model relative d...

arxiv.org

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

By directly conditioning the ASR model on diarization outputs, we simplify the workflow for multi-speaker and target-speaker scenarios. Importantly, DiCoW maintains Whisper’s performance on single-speaker transcription, ensuring robustness across diverse use cases.
[5/14]

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟Recent Papers
1. DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition - Submitted to CSL. Our diarization-conditioned approach that eliminates the need for speaker enrollment or source separation.
arxiv.org/abs/2501.00114
[4/14]

DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition

Speaker-attributed automatic speech recognition (ASR) in multi-speaker environments remains a significant challenge, particularly when systems conditioned on speaker embeddings fail to generalize to u...

arxiv.org

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

- Versatile and Robust: Despite all improvements, our systems retain high performance on single-speaker transcription tasks, ensuring broad applicability across use cases.
[3/14]

January 11, 2025 at 7:30 PM

Dominik Klement

@dklement.bsky.social

🌟Key Innovations
- Simplifying Multi-Speaker ASR: Our models directly use diarization outputs as conditioning signals, bypassing the need for enrollment data or complex source separation techniques.

[2/14]

January 11, 2025 at 7:30 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news