arXiv cs.SD Sound
cssd-bot.bsky.social
arXiv cs.SD Sound
@cssd-bot.bsky.social
Xiaosha Li, Chun Liu, Ziyu Wang: When Noise Lowers The Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models https://arxiv.org/abs/2602.02738 https://arxiv.org/pdf/2602.02738 https://arxiv.org/html/2602.02738
February 4, 2026 at 8:10 AM
Chengyuan Ma, Jiawei Jin, Ruijie Xiong, Chunxiang Jin, Canxiang Yan, Wenming Yang: VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech Synthesis https://arxiv.org/abs/2602.02591 https://arxiv.org/pdf/2602.02591 https://arxiv.org/html/2602.02591
February 4, 2026 at 8:00 AM
[2026-02-04 Wed (UTC), 10 new articles found for csSD Sound]
February 4, 2026 at 7:57 AM
Reposted by arXiv cs.SD Sound
Wei, Liao, Chang, Huang, Chen: Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations https://arxiv.org/abs/2602.01030 https://arxiv.org/pdf/2602.01030 https://arxiv.org/html/2602.01030
February 3, 2026 at 6:30 AM
Reposted by arXiv cs.SD Sound
Yang Xiao, Eun-Jung Holden, Ting Dang: Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages https://arxiv.org/abs/2602.01008 https://arxiv.org/pdf/2602.01008 https://arxiv.org/html/2602.01008
February 3, 2026 at 6:35 AM
Reposted by arXiv cs.SD Sound
Hao Ma, Ruihao Jing, Shansong Liu, Cheng Gong, Chi Zhang, Xiao-Lei Zhang, Xuelong Li: High-Fidelity Generative Audio Compression at 0.275kbps https://arxiv.org/abs/2602.00648 https://arxiv.org/pdf/2602.00648 https://arxiv.org/html/2602.00648
February 3, 2026 at 6:35 AM
Reposted by arXiv cs.SD Sound
Zhou, Li, Lin, Huang, Zhou, Yuan, Lan, Zhou, Li, Xu, Liao, Cheng, Chen, Mao, Feng: MTAVG-Bench: A Comprehensive Benchmark for Evaluating Multi-Talker Dialogue-Centric Audio-Video Generation https://arxiv.org/abs/2602.00607 https://arxiv.org/pdf/2602.00607 https://arxiv.org/html/2602.00607
February 3, 2026 at 6:33 AM
Reposted by arXiv cs.SD Sound
Zhijie Huang, Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu: Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling https://arxiv.org/abs/2602.00594 https://arxiv.org/pdf/2602.00594 https://arxiv.org/html/2602.00594
February 3, 2026 at 6:30 AM
Reposted by arXiv cs.SD Sound
Keisuke Kamahori, Wei-Tzu Lee, Atindra Jha, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci: VoxServe: Streaming-Centric Serving System for Speech Language Models https://arxiv.org/abs/2602.00269 https://arxiv.org/pdf/2602.00269 https://arxiv.org/html/2602.00269
February 3, 2026 at 6:33 AM
Rajalaxmi Rajagopalan, Ritwik Giri, Zhiqiang Tang, Kyu Han: Masked Autoencoders as Universal Speech Enhancer https://arxiv.org/abs/2602.02413 https://arxiv.org/pdf/2602.02413 https://arxiv.org/html/2602.02413
February 3, 2026 at 6:35 AM
Arnab Das, Yassine El Kheir, Enes Erdem Erdogan, Feidi Kallel, Tim Polzehl, Sebastian Moeller: DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild https://arxiv.org/abs/2602.02286 https://arxiv.org/pdf/2602.02286 https://arxiv.org/html/2602.02286
February 3, 2026 at 6:34 AM
Jaejun Lee, Yoori Oh, Kyogu Lee: LipSody: Lip-to-Speech Synthesis with Enhanced Prosody Consistency https://arxiv.org/abs/2602.01908 https://arxiv.org/pdf/2602.01908 https://arxiv.org/html/2602.01908
February 3, 2026 at 6:34 AM
Jaejun Lee, Yoori Oh, Kyogu Lee: Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only https://arxiv.org/abs/2602.01879 https://arxiv.org/pdf/2602.01879 https://arxiv.org/html/2602.01879
February 3, 2026 at 6:34 AM
Fei Liu, Yang Ai: ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-based Neural Speech Codec https://arxiv.org/abs/2602.01793 https://arxiv.org/pdf/2602.01793 https://arxiv.org/html/2602.01793
February 3, 2026 at 6:34 AM
Junya Koguchi, Tomoki Koriyama: Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection https://arxiv.org/abs/2602.01727 https://arxiv.org/pdf/2602.01727 https://arxiv.org/html/2602.01727
February 3, 2026 at 6:34 AM
Yuxuan Liu, Peihong Zhang, Rui Sang, Zhixin Li, Yizhou Tan, Yiqiang Cai, Shengchen Li: Membership Inference Attack Against Music Diffusion Models via Generative Manifold Perturbation https://arxiv.org/abs/2602.01645 https://arxiv.org/pdf/2602.01645 https://arxiv.org/html/2602.01645
February 3, 2026 at 6:34 AM
Yang, Zhao, Kang, Li, He, Liu, Zhang, Qu, Peng, Wang: Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition https://arxiv.org/abs/2602.01547 https://arxiv.org/pdf/2602.01547 https://arxiv.org/html/2602.01547
February 3, 2026 at 6:34 AM
Mari\"ette Olijslager, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag: Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings https://arxiv.org/abs/2602.01363 https://arxiv.org/pdf/2602.01363 https://arxiv.org/html/2602.01363
February 3, 2026 at 6:34 AM
Chengyuan Ma, Peng Jia, Hongyue Guo, Wenming Yang: TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection https://arxiv.org/abs/2602.01060 https://arxiv.org/pdf/2602.01060 https://arxiv.org/html/2602.01060
February 3, 2026 at 6:34 AM
Zhili Nicholas Liang, Soyeon Caren Han, Qizhou Wang, Christopher Leckie: HierCon: Hierarchical Contrastive Attention for Audio Deepfake Detection https://arxiv.org/abs/2602.01032 https://arxiv.org/pdf/2602.01032 https://arxiv.org/html/2602.01032
February 3, 2026 at 6:34 AM
Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo: ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation https://arxiv.org/abs/2602.00744 https://arxiv.org/pdf/2602.00744 https://arxiv.org/html/2602.00744
February 3, 2026 at 6:34 AM
Moummad, Miron, Rauch, Robinson, Joly, Pietquin, Chemla, Geist: Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation https://arxiv.org/abs/2602.00681 https://arxiv.org/pdf/2602.00681 https://arxiv.org/html/2602.00681
February 3, 2026 at 6:34 AM
Ayuto Tsutsumi, Kohei Tanaka, Sayaka Shiota: The TMU System for the XACLE Challenge: Training Large Audio Language Models with CLAP Pseudo-Labels https://arxiv.org/abs/2602.00604 https://arxiv.org/pdf/2602.00604 https://arxiv.org/html/2602.00604
February 3, 2026 at 6:34 AM
Ke Xue, Rongfei Fan, Kai Li, Shanping Yu, Puning Zhao, Jianping An: Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy https://arxiv.org/abs/2602.00568 https://arxiv.org/pdf/2602.00568 https://arxiv.org/html/2602.00568
February 3, 2026 at 6:34 AM
Yong Ren, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Tao Wang: Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards https://arxiv.org/abs/2602.00560 https://arxiv.org/pdf/2602.00560 https://arxiv.org/html/2602.00560
February 3, 2026 at 6:34 AM