Sumit
banner
reachsumit.com
Sumit
@reachsumit.com
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!

Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/
Pinned
I published Vol. 129 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/agentic-re...
Agentic Retrieval for Corpus-Level Reasoning, Compact, High-Performance Caching for RAG Agents, and More!
Vol.129 for Nov 03 - Nov 09, 2025
recsys.substack.com
DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding

@dwzhu128 et al. at Google present a multi-agent framework that enhances evidence localization in long documents.

📝 arxiv.org/abs/2511.11552
👨🏽‍💻 dwzhu-pku.github.io/DocLens/
DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding
Comprehending long visual documents, where information is distributed across extensive pages of text and visual elements, is a critical but challenging task for modern Vision-Language Models (VLMs). E...
arxiv.org
November 18, 2025 at 6:05 AM
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems

Introduces an open-source RAG workload dataset from a university Q&A system.

📝 arxiv.org/abs/2511.12979
👨🏽‍💻 github.com/flashserve/R...
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
Retrieval-Augmented Generation (RAG) is a critical paradigm for building reliable, knowledge-intensive Large Language Model (LLM) applications. However, the multi-stage pipeline (retrieve, generate) a...
arxiv.org
November 18, 2025 at 6:04 AM
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

Ant Group introduces a reranking framework that combines flexibility of pointwise methods with performance of listwise approaches.

📝 arxiv.org/abs/2511.11653
👨🏽‍💻 github.com/AQ-MedAI/Diver
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning
Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practi...
arxiv.org
November 18, 2025 at 6:03 AM
From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

Alibaba embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation.

📝 arxiv.org/abs/2511.12081
From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large la...
arxiv.org
November 18, 2025 at 5:59 AM
DualGR: Generative Retrieval with Long and Short-Term Interests Modeling

Kuaishou explicitly models dual horizons of user interests with selective activation, using a Dual-Branch Router to cover both stable preferences and transient intents.

📝 arxiv.org/abs/2511.12518
DualGR: Generative Retrieval with Long and Short-Term Interests Modeling
In large-scale industrial recommendation systems, retrieval must produce high-quality candidates from massive corpora under strict latency. Recently, Generative Retrieval (GR) has emerged as a viable ...
arxiv.org
November 18, 2025 at 5:58 AM
MindRec: Mind-inspired Coarse-to-fine Decoding for Generative Recommendation

Generates key tokens reflecting user preferences, then expands them into complete item recommendations using hierarchical category trees.

📝 arxiv.org/abs/2511.12597
👨🏽‍💻 github.com/Mr-Peach0301...
MindRec: Mind-inspired Coarse-to-fine Decoding for Generative Recommendation
Recent advancements in large language model-based recommendation systems often represent items as text or semantic IDs and generate recommendations in an auto-regressive manner. However, due to the le...
arxiv.org
November 18, 2025 at 5:57 AM
Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation

Uses an MoE architecture with codebooks to generate semantic tokens across multiple domains.

📝 arxiv.org/abs/2511.12922
👨🏽‍💻 github.com/jackfrost168...
Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation
Large language model (LLM)-based recommender systems have achieved high-quality performance by bridging the discrepancy between the item space and the language space through item tokenization. However...
arxiv.org
November 18, 2025 at 5:56 AM
AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking

Alibaba decouples user-side and item-side computations in pre-ranking models, executing them asynchronously to reduce latency and computational costs.

📝 arxiv.org/abs/2511.12934
AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking
In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation are trigge...
arxiv.org
November 18, 2025 at 5:54 AM
Attention Grounded Enhancement for Visual Document Retrieval

Alibaba presents a training framework that uses multimodal LLM attention maps to guide visual document retrievers toward capturing both explicit and implicit matches.

📝 arxiv.org/abs/2511.13415
👨🏽‍💻 anonymous.4open.science/r/AGREE-2025
Attention Grounded Enhancement for Visual Document Retrieval
Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy information needs. Recent advances use screenshot-based document encoding with fine-grained late inter...
arxiv.org
November 18, 2025 at 5:54 AM
Disentangled Interest Network for Out-of-Distribution CTR Prediction

Disentangles multiple user interests using sparse attention and weak supervision to improve robustness in out-of-distribution click-through rate prediction.

📝 dl.acm.org/doi/10.1145/...
👨🏽‍💻 github.com/DavyMorgan/D...
Disentangled Interest Network for Out-of-Distribution CTR Prediction | ACM Transactions on Information Systems
Click-through rate (CTR) prediction, which estimates the probability of a user clicking on a given item, is a critical task for online information services. Existing approaches often make strong assum...
dl.acm.org
November 17, 2025 at 5:16 AM
PISA: Combining Transformers and ACT-R for Repeat-Aware Sequential Listening Session Recommendation

Introduces a Transformer-based framework to model both repetitive and dynamic listening patterns for sequential music session recommendation

📝 dl.acm.org/doi/10.1145/...
👨🏽‍💻 github.com/deezer/recsy...
PISA: Combining Transformers and ACT-R for Repeat-Aware Sequential Listening Session Recommendation | ACM Transactions on Recommender Systems
Repetitive listening behaviors are often overlooked or inadequately addressed by music streaming platforms which often rely on sequential recommender systems to suggest music based on users’ listening...
dl.acm.org
November 17, 2025 at 5:15 AM
Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB

Proposes a plug-and-play training-free method that removes mean bias from text embeddings by subtracting or projecting out the mean vector.

📝 arxiv.org/abs/2511.11041
Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
We find that current text embedding models produce outputs with a consistent bias, i.e., each embedding vector $e$ can be decomposed as $\tilde{e} + μ$, where $μ$ is almost identical across all senten...
arxiv.org
November 17, 2025 at 5:14 AM
LEMUR: Large scale End-to-end MUltimodal Recommendation

ByteDance develops a large-scale end-to-end multimodal recommendation system with a novel memory bank mechanism, deployed on Douyin Search.

📝 arxiv.org/abs/2511.10962
LEMUR: Large scale End-to-end MUltimodal Recommendation
Traditional ID-based recommender systems often struggle with cold-start and generalization challenges. Multimodal recommendation systems, which leverage textual and visual data, offer a promising solu...
arxiv.org
November 17, 2025 at 5:12 AM
Align3GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

Kuaishou presents a unified framework combining token-level, behavior modeling, and preference-level alignment for LLM-based recommendation systems.

📝 arxiv.org/abs/2511.11255
Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation
Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming ...
arxiv.org
November 17, 2025 at 5:12 AM
MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising

Alibaba introduces a 3-stage training paradigm for multimodal representation learning in e-commerce, achieving +20% online CTR improvement on Taobao search advertising.

📝 arxiv.org/abs/2511.11305
MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising
We introduce MOON, our comprehensive set of sustainable iterative practices for multimodal representation learning for e-commerce applications. MOON has already been fully deployed across all stages o...
arxiv.org
November 17, 2025 at 5:11 AM
I published Vol. 130 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/a-critical...
A Critical Evaluation of RAG in Medicine, Decomposing the Value of Modern Recommendation Algorithms, and More!
Vol.130 for Nov 10 - Nov 16, 2025
recsys.substack.com
November 16, 2025 at 5:35 PM
REAP: Enhancing RAG with Recursive Evaluation and Adaptive Planning for Multi-Hop Question Answering

Introduces a dual-module framework that maintains structured sub-tasks and facts through recursive evaluation.

📝 arxiv.org/abs/2511.09966
👨🏽‍💻 github.com/Deus-Glen/REAP
REAP: Enhancing RAG with Recursive Evaluation and Adaptive Planning for Multi-Hop Question Answering
Retrieval-augmented generation (RAG) has been extensively employed to mitigate hallucinations in large language models (LLMs). However, existing methods for multi-hop reasoning tasks often lack global...
arxiv.org
November 14, 2025 at 3:27 AM
Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

Introduces a training-free method that determines optimal retrieval timing by modeling token-level uncertainty dynamics using entropy trends.

📝 arxiv.org/abs/2511.09980
👨🏽‍💻 github.com/pkuserc/ETC
Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG
Dynamic retrieval-augmented generation (RAG) allows large language models (LLMs) to fetch external knowledge on demand, offering greater adaptability than static RAG. A central challenge in this setti...
arxiv.org
November 14, 2025 at 3:26 AM
GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation

Tencent presents a unified generative framework that replaces traditional multi-stage advertising recommendation with an end-to-end approach.

📝 arxiv.org/abs/2511.10138
GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation
As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. How...
arxiv.org
November 14, 2025 at 3:25 AM
Local Hybrid Retrieval-Augmented Document QA

Presents a fully local RAG system combining semantic and keyword retrieval.

📝 arxiv.org/abs/2511.10297
👨🏽‍💻 github.com/PaoloAstrino...
Local Hybrid Retrieval-Augmented Document QA
Organizations handling sensitive documents face a critical dilemma: adopt cloud-based AI systems that offer powerful question-answering capabilities but compromise data privacy, or maintain local proc...
arxiv.org
November 14, 2025 at 3:24 AM
Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding

Meta integrates human priors into generative recommenders through lightweight adapter heads.

📝 arxiv.org/abs/2511.10492
👨🏽‍💻 github.com/zhykoties/Mu...
Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding
Optimizing recommender systems for objectives beyond accuracy, such as diversity, novelty, and personalization, is crucial for long-term user satisfaction. To this end, industrial practitioners have a...
arxiv.org
November 14, 2025 at 3:23 AM
URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding

Unifies retrieval and generation within a single MLLM using lightweight cross-modal adapters.

📝 arxiv.org/abs/2511.10552
👨🏽‍💻 github.com/shi-yx/URaG
URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding
Recent multimodal large language models (MLLMs) still struggle with long document understanding due to two fundamental challenges: information interference from abundant irrelevant content, and the qu...
arxiv.org
November 14, 2025 at 3:22 AM
Practical RAG Evaluation: A Rarity-Aware Set-Based Metric and Cost-Latency-Quality Trade-offs

Proposes a metric for RAG evaluation that measures evidence presence at fixed prompt budgets with operational cost-latency-quality trade-offs.

📝 arxiv.org/abs/2511.09545
👨🏽‍💻 github.com/etidal2/rag-gs
Practical RAG Evaluation: A Rarity-Aware Set-Based Metric and Cost-Latency-Quality Trade-offs
This paper addresses the guessing game in building production RAG. Classical rank-centric IR metrics (nDCG/MAP/MRR) are a poor fit for RAG, where LLMs consume a set of passages rather than a browsed l...
arxiv.org
November 13, 2025 at 3:16 AM
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

Evaluates retrieval-augmented reasoning steps bidirectionally using information distance to optimize both answer-seeking and question-grounding.

📝 arxiv.org/abs/2511.09109
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios....
arxiv.org
November 13, 2025 at 3:14 AM
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Introduces a hierarchical thinking model that decomposes complex problems into solvable sub-problems.

📝 arxiv.org/abs/2511.07943
👨🏽‍💻 github.com/OpenSPG/KAG-...
Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction
Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving comp...
arxiv.org
November 12, 2025 at 4:55 AM