#MLLM
Zhenchen Tang, Songlin Yang, Bo Peng, Zichuan Wang, Jing Dong
Revisiting MLLM Based Image Quality Assessment: Errors and Remedy
https://arxiv.org/abs/2511.07812
November 12, 2025 at 10:36 AM
Xu, Yang, Zhao, Zhang, Chen, Ma, Hou, Wu, Li, Hu, Guan, Li, Po: From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training https://arxiv.org/abs/2511.07738 https://arxiv.org/pdf/2511.07738 https://arxiv.org/html/2511.07738
November 12, 2025 at 6:33 AM
Zhenchen Tang, Songlin Yang, Bo Peng, Zichuan Wang, Jing Dong: Revisiting MLLM Based Image Quality Assessment: Errors and Remedy https://arxiv.org/abs/2511.07812 https://arxiv.org/pdf/2511.07812 https://arxiv.org/html/2511.07812
November 12, 2025 at 6:30 AM
[2025-11-11] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2511.06651" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation
(2) NOVO: Bridging LLaVA and SAM with Visual-only Prompts for Reasoning Segmentation

🔍 More at researchtrend.ai/communities/MLLM
November 11, 2025 at 4:02 AM
Why AI Sucks At Telling Time... and why this should concern us for autonomous vehicles and more.

#News #TechNews #AI #MLLM #AIlimitations #SelfDriving #MedTech
Why AI Sucks At Telling Time...
YouTube video by Nick Espinosa
youtu.be
November 10, 2025 at 10:10 PM
Daily podcast: Why AI Sucks At Telling Time... and why this should concern us for autonomous vehicles and more.

#News #TechNews #AI #MLLM #AIlimitations #SelfDriving #MedTech #podcast

soundcloud.com/nickaesp/acr
Why AI Sucks At Telling Time...
and why this should concern us for autonomous vehicles and more.
soundcloud.com
November 10, 2025 at 10:09 PM
November 10, 2025 at 6:48 AM
[2025-11-07] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2410.04514" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
(2) DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination

🔍 More at researchtrend.ai/communities/MLLM
November 7, 2025 at 3:08 AM
Liu, \c{C}oban, Schevchenko, Tang, Zhu, Mandel, Devaney: An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM https://arxiv.org/abs/2511.02234 https://arxiv.org/pdf/2511.02234 https://arxiv.org/html/2511.02234
November 5, 2025 at 6:33 AM
[2025-11-05] 📚 Updates in #AuLLM

(1) <a href="https://researchtrend.ai/papers/2511.02234" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM
(2) An Evaluation of Interleaved Instruction Tuning on Semantic Reasoning Performance in an Audio MLLM

🔍 More at researchtrend.ai/communities/AuLLM
November 5, 2025 at 3:10 AM
[2025-11-05] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2511.02607" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">UniChange: Unifying Change Detection with Multimodal Large Language Model
(2) UniChange: Unifying Change Detection with Multimodal Large Language Model

🔍 More at researchtrend.ai/communities/MLLM
November 5, 2025 at 3:10 AM
Zhicheng Wang, Chen Ju, Xu Chen, Shuai Xiao, Jinsong Lan, Xiaoyong Zhu, Ying Chen, Zhiguo Cao: Explore More, Learn Better: Parallel MLLM Embeddings under Mutual Information Minimization https://arxiv.org/abs/2511.01588 https://arxiv.org/pdf/2511.01588 https://arxiv.org/html/2511.01588
November 4, 2025 at 6:35 AM
[2025-11-04] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2511.00279" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">LongCat-Flash-Omni Technical Report
(2) LongCat-Flash-Omni Technical Report
(3) UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings

🔍 More at researchtrend.ai/communities/MLLM
November 4, 2025 at 4:01 AM
Zhan, Ha, Yang, Xu, Chen, Gui, Wang, Zhang, Ji, Kang: Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning https://arxiv.org/abs/2510.27623 https://arxiv.org/pdf/2510.27623 https://arxiv.org/html/2510.27623
November 3, 2025 at 6:29 AM
[2025-11-03] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2510.27164" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">Generating Accurate and Detailed Captions for High-Resolution Images
(2) Generating Accurate and Detailed Captions for High-Resolution Images

🔍 More at researchtrend.ai/communities/MLLM
November 3, 2025 at 3:08 AM
[2025-10-31] 📚 Updates in #MLLM

(1) PairUni: Pairwise Training for Unified Multimodal Language Models
(2) <a href="https://researchtrend.ai/papers/2510.26583" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">Emu3.5: Native Multimodal Models are World Learners
(3) Emu3.5: Native Multimodal Models are World Learners

🔍 More at researchtrend.ai/communities/MLLM
October 31, 2025 at 3:10 AM
LLM? MLM?? MLLM??? WLW??????? some of you need to be Loving the Lord More, More Lord More, More Loving the Lord Mas, and Worshiping the Lord in Wonder
October 30, 2025 at 3:10 PM
Ruiyang Zhang, Jiahao Luo, Xiaoru Feng, Qiufan Pang, Yaodong Yang, Juntao Dai
SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing
https://arxiv.org/abs/2510.24820
October 30, 2025 at 8:14 AM
Ruiyang Zhang, Jiahao Luo, Xiaoru Feng, Qiufan Pang, Yaodong Yang, Juntao Dai: SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing https://arxiv.org/abs/2510.24820 https://arxiv.org/pdf/2510.24820 https://arxiv.org/html/2510.24820
October 30, 2025 at 6:30 AM
[2025-10-29] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2510.23642" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">VisCoder2: Building Multi-Language Visualization Coding Agents
(2) VisCoder2: Building Multi-Language Visualization Coding Agents
(3) Compositional Image Synthesis with Inference-Time Scaling

🔍 More at researchtrend.ai/communities/MLLM
October 29, 2025 at 3:09 AM
Zahraa Al Sahili, Maryam Fetanat, Maimuna Nowaz, Ioannis Patras, Matthew Purver
FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment
https://arxiv.org/abs/2510.22827
October 28, 2025 at 10:58 AM
Zahraa Al Sahili, Maryam Fetanat, Maimuna Nowaz, Ioannis Patras, Matthew Purver: FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment https://arxiv.org/abs/2510.22827 https://arxiv.org/pdf/2510.22827 https://arxiv.org/html/2510.22827
October 28, 2025 at 6:32 AM
[2025-10-28] 📚 Updates in #MLLM

(1) <a href="https://researchtrend.ai/papers/2510.21794" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">Token-Level Inference-Time Alignment for Vision-Language Models
(2) Token-Level Inference-Time Alignment for Vision-Language Models
(3) Top-Down Semantic Refinement for Image Captioning

🔍 More at researchtrend.ai/communities/MLLM
October 28, 2025 at 3:13 AM
Xingwei Zhong, Kar Wai Fok, Vrizlynn L. L. Thing: Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses https://arxiv.org/abs/2510.21214 https://arxiv.org/pdf/2510.21214 https://arxiv.org/html/2510.21214
October 27, 2025 at 6:30 AM