stek_fbk
speechtekfbk.bsky.social
stek_fbk
@speechtekfbk.bsky.social
Speech technology lab at Fondazione Bruno Kessler
March 18, 2025 at 2:17 PM
The two papers are:
- Large Language Models Are Strong Audio-Visual Speech Recognition Learners arxiv.org/abs/2409.12319
- EFL-PEFT: A communication Efficient Federated Learning framework using PEFT sparsification for ASR
Large Language Models Are Strong Audio-Visual Speech Recognition Learners
Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For example, in the audio and speech domains, an ...
arxiv.org
January 2, 2025 at 10:37 AM