arXiv eess.AS Audio and Speech Processing
eessas-bot.bsky.social
arXiv eess.AS Audio and Speech Processing
@eessas-bot.bsky.social
Reposted by arXiv eess.AS Audio and Speech Processing
Ellinas, Vioni, Kakoulidis, Vamvoukakis, Christidou, Markopoulos, Oh, Jho, Hwang, Chalamandaris, Tsiakoulis: Pseudo-Cepstrum: Pitch Modification for Mel-Based Neural Vocoders https://arxiv.org/abs/2512.16519 https://arxiv.org/pdf/2512.16519 https://arxiv.org/html/2512.16519
December 19, 2025 at 6:34 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Dong, Xu, He, Han, Christofferson, Chen, Taya, Nishiyama, Sezaki: Poster: Recognizing Hidden-in-the-Ear Private Key for Reliable Silent Speech Interface Using Multi-Task Learning https://arxiv.org/abs/2512.16518 https://arxiv.org/pdf/2512.16518 https://arxiv.org/html/2512.16518
December 19, 2025 at 6:32 AM
Anup Singh, Kris Demuynck, Vipul Arora: BEST-STD2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection https://arxiv.org/abs/2512.16395 https://arxiv.org/pdf/2512.16395 https://arxiv.org/html/2512.16395
December 19, 2025 at 6:35 AM
Gloria Dal Santo, Karolina Prawda, Sebastian J. Schlecht, Vesa V\"alim\"aki: Learning Recursive Attenuation Filters Under Noisy Conditions https://arxiv.org/abs/2512.16318 https://arxiv.org/pdf/2512.16318 https://arxiv.org/html/2512.16318
December 19, 2025 at 6:35 AM
[2025-12-19 Fri (UTC), 2 new articles found for eessAS Audio and Speech Processing]
December 19, 2025 at 6:35 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Ken O'Hanlon, Basil Woods, Lin Wang, Mark Sandler: A Conditioned UNet for Music Source Separation https://arxiv.org/abs/2512.15532 https://arxiv.org/pdf/2512.15532 https://arxiv.org/html/2512.15532
December 18, 2025 at 6:34 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Aref Farhadipour, Teodora Vukovic, Volker Dellwo, Petr Motlicek, Srikanth Madikeri: Adaptive Multimodal Person Recognition: A Robust Framework for Handling Missing Modalities https://arxiv.org/abs/2512.14961 https://arxiv.org/pdf/2512.14961 https://arxiv.org/html/2512.14961
December 18, 2025 at 6:30 AM
S\'everin Baroudi, Herv\'e Bredin, Joseph Razik, Ricard Marxer: On the Use of Self-Supervised Representation Learning for Speaker Diarization and Separation https://arxiv.org/abs/2512.15224 https://arxiv.org/pdf/2512.15224 https://arxiv.org/html/2512.15224
December 18, 2025 at 6:35 AM
[2025-12-18 Thu (UTC), 1 new article found for eessAS Audio and Speech Processing]
December 18, 2025 at 6:35 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Lu, Gao, Liang, Wang, Thebaud, Moro-Velazquez, Dehak, Villalba: Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization https://arxiv.org/abs/2512.14687 https://arxiv.org/pdf/2512.14687 https://arxiv.org/html/2512.14687
December 17, 2025 at 6:30 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Marianne de Heer Kloots, Paul Boersma, Willem Zuidema: Linguists should learn to love speech-based deep learning models https://arxiv.org/abs/2512.14506 https://arxiv.org/pdf/2512.14506 https://arxiv.org/html/2512.14506
December 17, 2025 at 6:30 AM
Pawel Swietojanski, Xinwei Li, Mingbin Xu, Takaaki Hori, Dogan Can, Xiaodan Zhuang: Segmental Attention Decoding With Long Form Acoustic Encodings https://arxiv.org/abs/2512.14652 https://arxiv.org/pdf/2512.14652 https://arxiv.org/html/2512.14652
December 17, 2025 at 6:35 AM
Dick, Thompson, Wu, Delgado, Williams, Torcoli: Investigating the impact of stereo processing -- a study for extending the Open Dataset of Audio Quality (ODAQ) https://arxiv.org/abs/2512.14259 https://arxiv.org/pdf/2512.14259 https://arxiv.org/html/2512.14259
December 17, 2025 at 6:35 AM
Sungnyun Kim: Scalable Frameworks for Real-World Audio-Visual Speech Recognition https://arxiv.org/abs/2512.14083 https://arxiv.org/pdf/2512.14083 https://arxiv.org/html/2512.14083
December 17, 2025 at 6:35 AM
[2025-12-17 Wed (UTC), 3 new articles found for eessAS Audio and Speech Processing]
December 17, 2025 at 6:35 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Tang, Lei, Zhu, Chen, Yuan, Li, Oh, Zhang, Huang, Benetos, Liu, Liu, Ma: AutoMV: An Automatic Multi-Agent System for Music Video Generation https://arxiv.org/abs/2512.12196 https://arxiv.org/pdf/2512.12196 https://arxiv.org/html/2512.12196
December 16, 2025 at 6:33 AM
Sathwika Peechara, Rajeev Sahay: REVERB-FL: Server-Side Adversarial and Reserve-Enhanced Federated Learning for Robust Audio Classification https://arxiv.org/abs/2512.13647 https://arxiv.org/pdf/2512.13647 https://arxiv.org/html/2512.13647
December 16, 2025 at 6:35 AM
Junyi Peng, Jin Li, Johan Rohdin, Lin Zhang, Miroslav Hlav\'a\v{c}ek, Oldrich Plchot: BUT Systems for WildSpoof Challenge: SASV in the Wild https://arxiv.org/abs/2512.12851 https://arxiv.org/pdf/2512.12851 https://arxiv.org/html/2512.12851
December 16, 2025 at 6:35 AM
[2025-12-16 Tue (UTC), 2 new articles found for eessAS Audio and Speech Processing]
December 16, 2025 at 6:35 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Cang, Liu, Sheng, Cui, Li, Fa, Chen, Yang, Yang: Robust Detection of Underwater Target Against Non-Uniform Noise With Optical Fiber DAS Array https://arxiv.org/abs/2512.11231 https://arxiv.org/pdf/2512.11231 https://arxiv.org/html/2512.11231
December 15, 2025 at 6:36 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Nahabwe, Kagumire, Musinguzi, Beijuka, Kyagaba, Nabende, Katumba, Nakatumba-Nabende: Benchmarking Automatic Speech Recognition Models for African Languages https://arxiv.org/abs/2512.10968 https://arxiv.org/pdf/2512.10968 https://arxiv.org/html/2512.10968
December 15, 2025 at 6:29 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Kumar, Shivaprakash, Manoharan, Kurariya, Mukherjee, Shukla, Mukherjee, Chand, Murthy: ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages https://arxiv.org/abs/2512.10967 https://arxiv.org/pdf/2512.10967 https://arxiv.org/html/2512.10967
December 15, 2025 at 6:29 AM
Takafumi Moriya, Masato Mimura, Tomohiro Tanaka, Hiroshi Sato, Ryo Masumura, Atsunori Ogawa: All-in-One ASR: Unifying Encoder-Decoder Models of CTC, Attention, and Transducer in Dual-Mode ASR https://arxiv.org/abs/2512.11543 https://arxiv.org/pdf/2512.11543 https://arxiv.org/html/2512.11543
December 15, 2025 at 6:35 AM
[2025-12-15 Mon (UTC), 1 new article found for eessAS Audio and Speech Processing]
December 15, 2025 at 6:35 AM
Reposted by arXiv eess.AS Audio and Speech Processing
Zitong Lan, Yiwei Tang, Yuhan Wang, Haowen Lai, Yido Hao, Mingmin Zhao: Building Audio-Visual Digital Twins with Smartphones https://arxiv.org/abs/2512.10778 https://arxiv.org/pdf/2512.10778 https://arxiv.org/html/2512.10778
December 12, 2025 at 6:34 AM