Lightnews — Scholar-powered news

arXiv cs.SD Sound

@cssd-bot.bsky.social

Xiaosha Li, Chun Liu, Ziyu Wang: When Noise Lowers The Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models https://arxiv.org/abs/2602.02738 https://arxiv.org/pdf/2602.02738 https://arxiv.org/html/2602.02738

February 4, 2026 at 8:10 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Chengyuan Ma, Jiawei Jin, Ruijie Xiong, Chunxiang Jin, Canxiang Yan, Wenming Yang: VividVoice: A Unified Framework for Scene-Aware Visually-Driven Speech Synthesis https://arxiv.org/abs/2602.02591 https://arxiv.org/pdf/2602.02591 https://arxiv.org/html/2602.02591

February 4, 2026 at 8:00 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

[2026-02-04 Wed (UTC), 10 new articles found for csSD Sound]

February 4, 2026 at 7:57 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Wei, Liao, Chang, Huang, Chen: Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations https://arxiv.org/abs/2602.01030 https://arxiv.org/pdf/2602.01030 https://arxiv.org/html/2602.01030

February 3, 2026 at 6:30 AM

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Yang Xiao, Eun-Jung Holden, Ting Dang: Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages https://arxiv.org/abs/2602.01008 https://arxiv.org/pdf/2602.01008 https://arxiv.org/html/2602.01008

February 3, 2026 at 6:35 AM

Reposted by arXiv cs.SD Sound

arXiv eess.AS Audio and Speech Processing

@eessas-bot.bsky.social

Hao Ma, Ruihao Jing, Shansong Liu, Cheng Gong, Chi Zhang, Xiao-Lei Zhang, Xuelong Li: High-Fidelity Generative Audio Compression at 0.275kbps https://arxiv.org/abs/2602.00648 https://arxiv.org/pdf/2602.00648 https://arxiv.org/html/2602.00648

February 3, 2026 at 6:35 AM

Reposted by arXiv cs.SD Sound

arXiv cs.MM Multimedia

@csmm-bot.bsky.social

Zhou, Li, Lin, Huang, Zhou, Yuan, Lan, Zhou, Li, Xu, Liao, Cheng, Chen, Mao, Feng: MTAVG-Bench: A Comprehensive Benchmark for Evaluating Multi-Talker Dialogue-Centric Audio-Video Generation https://arxiv.org/abs/2602.00607 https://arxiv.org/pdf/2602.00607 https://arxiv.org/html/2602.00607

February 3, 2026 at 6:33 AM

Reposted by arXiv cs.SD Sound

arXiv cs.CL Computation and Language

@cscl-bot.bsky.social

Zhijie Huang, Stephen McIntosh, Daisuke Saito, Nobuaki Minematsu: Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling https://arxiv.org/abs/2602.00594 https://arxiv.org/pdf/2602.00594 https://arxiv.org/html/2602.00594

February 3, 2026 at 6:30 AM

Reposted by arXiv cs.SD Sound

arXiv cs.LG Machine Learning

@cslg-bot.bsky.social

Keisuke Kamahori, Wei-Tzu Lee, Atindra Jha, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci: VoxServe: Streaming-Centric Serving System for Speech Language Models https://arxiv.org/abs/2602.00269 https://arxiv.org/pdf/2602.00269 https://arxiv.org/html/2602.00269

February 3, 2026 at 6:33 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Rajalaxmi Rajagopalan, Ritwik Giri, Zhiqiang Tang, Kyu Han: Masked Autoencoders as Universal Speech Enhancer https://arxiv.org/abs/2602.02413 https://arxiv.org/pdf/2602.02413 https://arxiv.org/html/2602.02413

February 3, 2026 at 6:35 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Arnab Das, Yassine El Kheir, Enes Erdem Erdogan, Feidi Kallel, Tim Polzehl, Sebastian Moeller: DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild https://arxiv.org/abs/2602.02286 https://arxiv.org/pdf/2602.02286 https://arxiv.org/html/2602.02286

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Jaejun Lee, Yoori Oh, Kyogu Lee: LipSody: Lip-to-Speech Synthesis with Enhanced Prosody Consistency https://arxiv.org/abs/2602.01908 https://arxiv.org/pdf/2602.01908 https://arxiv.org/html/2602.01908

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Jaejun Lee, Yoori Oh, Kyogu Lee: Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only https://arxiv.org/abs/2602.01879 https://arxiv.org/pdf/2602.01879 https://arxiv.org/html/2602.01879

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Fei Liu, Yang Ai: ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-based Neural Speech Codec https://arxiv.org/abs/2602.01793 https://arxiv.org/pdf/2602.01793 https://arxiv.org/html/2602.01793

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Junya Koguchi, Tomoki Koriyama: Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection https://arxiv.org/abs/2602.01727 https://arxiv.org/pdf/2602.01727 https://arxiv.org/html/2602.01727

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Yuxuan Liu, Peihong Zhang, Rui Sang, Zhixin Li, Yizhou Tan, Yiqiang Cai, Shengchen Li: Membership Inference Attack Against Music Diffusion Models via Generative Manifold Perturbation https://arxiv.org/abs/2602.01645 https://arxiv.org/pdf/2602.01645 https://arxiv.org/html/2602.01645

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Yang, Zhao, Kang, Li, He, Liu, Zhang, Qu, Peng, Wang: Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition https://arxiv.org/abs/2602.01547 https://arxiv.org/pdf/2602.01547 https://arxiv.org/html/2602.01547

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Mari\"ette Olijslager, Seyed Sahand Mohammadi Ziabari, Ali Mohammed Mansoor Alsahag: Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings https://arxiv.org/abs/2602.01363 https://arxiv.org/pdf/2602.01363 https://arxiv.org/html/2602.01363

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Chengyuan Ma, Peng Jia, Hongyue Guo, Wenming Yang: TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection https://arxiv.org/abs/2602.01060 https://arxiv.org/pdf/2602.01060 https://arxiv.org/html/2602.01060

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Zhili Nicholas Liang, Soyeon Caren Han, Qizhou Wang, Christopher Leckie: HierCon: Hierarchical Contrastive Attention for Audio Deepfake Detection https://arxiv.org/abs/2602.01032 https://arxiv.org/pdf/2602.01032 https://arxiv.org/html/2602.01032

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Junmin Gong, Yulin Song, Wenxiao Zhao, Sen Wang, Shengyuan Xu, Jing Guo: ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation https://arxiv.org/abs/2602.00744 https://arxiv.org/pdf/2602.00744 https://arxiv.org/html/2602.00744

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Moummad, Miron, Rauch, Robinson, Joly, Pietquin, Chemla, Geist: Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation https://arxiv.org/abs/2602.00681 https://arxiv.org/pdf/2602.00681 https://arxiv.org/html/2602.00681

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Ayuto Tsutsumi, Kohei Tanaka, Sayaka Shiota: The TMU System for the XACLE Challenge: Training Large Audio Language Models with CLAP Pseudo-Labels https://arxiv.org/abs/2602.00604 https://arxiv.org/pdf/2602.00604 https://arxiv.org/html/2602.00604

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Ke Xue, Rongfei Fan, Kai Li, Shanping Yu, Puning Zhao, Jianping An: Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy https://arxiv.org/abs/2602.00568 https://arxiv.org/pdf/2602.00568 https://arxiv.org/html/2602.00568

February 3, 2026 at 6:34 AM

arXiv cs.SD Sound

@cssd-bot.bsky.social

Yong Ren, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Tao Wang: Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards https://arxiv.org/abs/2602.00560 https://arxiv.org/pdf/2602.00560 https://arxiv.org/html/2602.00560

February 3, 2026 at 6:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news