Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
https://arxiv.org/abs/2501.16981
Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection
https://arxiv.org/abs/2501.16981
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
https://arxiv.org/abs/2402.17680
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
https://arxiv.org/abs/2402.17680