Sumit
banner
reachsumit.com
Sumit
@reachsumit.com
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!

Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/
Pinned
I published Vol. 129 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/agentic-re...
Agentic Retrieval for Corpus-Level Reasoning, Compact, High-Performance Caching for RAG Agents, and More!
Vol.129 for Nov 03 - Nov 09, 2025
recsys.substack.com
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

Alibaba introduces an adaptive regularization method that addresses the one-epoch overfitting problem in CTR/CVR models.

📝 arxiv.org/abs/2511.06374
👨🏽‍💻 anonymous.4open.science/r/AdaptiveRe...
Adaptive Regularization for Large-Scale Sparse Feature Embedding Models
The one-epoch overfitting problem has drawn widespread attention, especially in CTR and CVR estimation models in search, advertising, and recommendation domains. These models which rely heavily on lar...
arxiv.org
November 11, 2025 at 8:09 AM
The Value of Personalized Recommendations: Evidence from Netflix

Netflix examines how personalized recommendations drive user engagement by building a discrete choice model on 2 million users.

📝 arxiv.org/abs/2511.07280
The Value of Personalized Recommendations: Evidence from Netflix
Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discre...
arxiv.org
November 11, 2025 at 8:08 AM
A Representation Sharpening Framework for Zero Shot Dense Retrieval

Amazon proposes a training-free framework that augments document representations with contrastive queries to improve zero-shot dense retrieval without retraining the model.

📝 arxiv.org/abs/2511.05684
A Representation Sharpening Framework for Zero Shot Dense Retrieval
Zero-shot dense retrieval is a challenging setting where a document corpus is provided without relevant queries, necessitating a reliance on pretrained dense retrievers (DRs). However, since these DRs...
arxiv.org
November 11, 2025 at 8:07 AM
A Remarkably Efficient Paradigm to Multimodal Large Language Models for Sequential Recommendation

Introduces a method to compress multimodal item representations into single tokens and enhance sequential position awareness.

📝 arxiv.org/abs/2511.05885
A Remarkably Efficient Paradigm to Multimodal Large Language Models for Sequential Recommendation
In this paper, we proposed Speeder, a remarkably efficient paradigm to multimodal large language models for sequential recommendation. Speeder introduces 3 key components: (1) Multimodal Representatio...
arxiv.org
November 11, 2025 at 8:06 AM
Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin

ByteDance scales long-sequence modeling to 10k-length histories through efficient cross-attention, user-centric batching, and length-extrapolative training.

📝 arxiv.org/abs/2511.06077
Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin
Short-video recommenders such as Douyin must exploit extremely long user histories without breaking latency or cost budgets. We present an end-to-end system that scales long-sequence modeling to 10k-l...
arxiv.org
November 11, 2025 at 8:05 AM
Evaluation of retrieval-based QA on QUEST-LOFT

Google DeepMind provides an in-depth analysis of RAG on QUEST-LOFT, demonstrating that RAG combined with structured outputs and self-verification significantly outperforms long-context approaches.

📝 arxiv.org/abs/2511.06125
Evaluation of retrieval-based QA on QUEST-LOFT
Despite the popularity of retrieval-augmented generation (RAG) as a solution for grounded QA in both academia and industry, current RAG methods struggle with questions where the necessary information ...
arxiv.org
November 11, 2025 at 8:04 AM
LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation

Proposes a discrete diffusion framework that reformulates recommendation as parallel semantic ID generation.

📝 arxiv.org/abs/2511.06254
👨🏽‍💻 github.com/TengShi-RUC/...
LLaDA-Rec: Discrete Diffusion for Parallel Semantic ID Generation in Generative Recommendation
Generative recommendation represents each item as a semantic ID, i.e., a sequence of discrete tokens, and generates the next item through autoregressive decoding. While effective, existing autoregress...
arxiv.org
November 11, 2025 at 8:03 AM
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights

Presents a comprehensive evaluation of RAG in medicine, revealing that standard RAG often degrades performance.

📝 arxiv.org/abs/2511.06738
👨🏽‍💻 github.com/Yale-BIDS-Ch...
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Large language models (LLMs) are transforming the landscape of medicine, yet two fundamental challenges persist: keeping up with rapidly evolving medical knowledge and providing verifiable, evidence-g...
arxiv.org
November 11, 2025 at 8:01 AM
Have We Really Understood Collaborative Information? An Empirical Investigation

Introduces a quantitative definition of collaborative information in recommender systems, analyzing its manifestation and impact on various recommendation algorithms.

📝 arxiv.org/abs/2511.06905
Have We Really Understood Collaborative Information? An Empirical Investigation
Collaborative information serves as the cornerstone of recommender systems which typically focus on capturing it from user-item interactions to deliver personalized services. However, current understa...
arxiv.org
November 11, 2025 at 7:57 AM
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

NVIDIA introduces a text embedding model trained on 16M query-document pairs, achieving top performance on multilingual benchmarks.

📝 arxiv.org/abs/2511.07025
👨🏽‍💻 huggingface.co/nvidia/llama...
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks
We introduce llama-embed-nemotron-8b, an open-weights text embedding model that achieves state-of-the-art performance on the Multilingual Massive Text Embedding Benchmark (MMTEB) leaderboard as of Oct...
arxiv.org
November 11, 2025 at 7:56 AM
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Proposes Q-RAG, a resource-efficient multi-step retrieval approach that fine-tunes embedder models using RL, achieving state-of-the-art results on long-context benchmarks.

📝 arxiv.org/abs/2511.07328
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods ...
arxiv.org
November 11, 2025 at 7:55 AM
DMA: Online RAG Alignment with Human Feedback

Uses multi-granularity human feedback to continuously align retrieval and ranking in RAG systems through online learning.

📝 arxiv.org/abs/2511.04880
DMA: Online RAG Alignment with Human Feedback
Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online learning f...
arxiv.org
November 10, 2025 at 3:25 AM
EncouRAGe: Evaluating RAG Local, Fast, and Reliable

Introduces a Python framework for developing and evaluating Retrieval-Augmented Generation systems with modular components and diverse metrics.

📝 arxiv.org/abs/2511.04696
👨🏽‍💻 anonymous.4open.science/r/encourage-...
EncouRAGe: Evaluating RAG Local, Fast, and Reliable
We introduce EncouRAGe, a comprehensive Python framework designed to streamline the development and evaluation of Retrieval-Augmented Generation (RAG) systems using Large Language Models (LLMs) and Em...
arxiv.org
November 10, 2025 at 3:24 AM
Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation

Proposes a framework that systematically filters noisy documents through query-aware clustering and multi-agent iterative refinement.

📝 arxiv.org/abs/2511.04700
👨🏽‍💻 github.com/SongW-SW/Win...
Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources to address their limitations in accessing up-to-date or specialized information. A ...
arxiv.org
November 10, 2025 at 3:23 AM
Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG

Proposes a dual-layer architecture separating fine-grained search chunks from coarse-grained retrieval contexts, improving composability and context fidelity.

📝 arxiv.org/abs/2511.04939
Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG
Retrieval systems are essential to contemporary AI pipelines, although most confuse two separate processes: finding relevant information and giving enough context for reasoning. We introduce the Searc...
arxiv.org
November 10, 2025 at 3:20 AM
QUESTER: Query Specification for Generative Retrieval

Introduces a method where small LLMs generate keyword queries for BM25 retrieval, trained via reinforcement learning to balance effectiveness and efficiency in information retrieval.

📝 arxiv.org/abs/2511.05301
QUESTER: Query Specification for Generative Retrieval
Generative Retrieval (GR) differs from the traditional index-then-retrieve pipeline by storing relevance in model parameters and directly generating document identifiers. However, GR often struggles t...
arxiv.org
November 10, 2025 at 3:19 AM
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework

Compresses retrieval content using knowledge graphs and reduces reasoning steps through iterative process-aware preference optimization.

📝 arxiv.org/abs/2511.05385
👨🏽‍💻 github.com/Applied-Mach...
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
Retrieval-Augmented Generation (RAG) utilizes external knowledge to augment Large Language Models' (LLMs) reliability. For flexibility, agentic RAG employs autonomous, multi-round retrieval and reason...
arxiv.org
November 10, 2025 at 3:19 AM
I published Vol. 129 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/agentic-re...
Agentic Retrieval for Corpus-Level Reasoning, Compact, High-Performance Caching for RAG Agents, and More!
Vol.129 for Nov 03 - Nov 09, 2025
recsys.substack.com
November 9, 2025 at 5:14 PM
NVIDIA Nemotron Nano V2 VL

NVIDIA introduces a 12B vision-language model achieving leading OCR performance with strong capabilities in document understanding, long video comprehension, and reasoning tasks.

📝 arxiv.org/abs/2511.03929
👨🏽‍💻 huggingface.co/nvidia/NVIDI...
NVIDIA Nemotron Nano V2 VL
We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron...
arxiv.org
November 7, 2025 at 3:39 AM
RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG

Presents an automated framework for evaluating domain-specific RAG systems with human-aligned LLM-as-a-Judge metrics and agentic synthetic QA generation.

📝 arxiv.org/abs/2511.04502
RAGalyst: Automated Human-Aligned Agentic Evaluation for Domain-Specific RAG
Retrieval-Augmented Generation (RAG) is a critical technique for grounding Large Language Models (LLMs) in factual evidence, yet evaluating RAG systems in specialized, safety-critical domains remains ...
arxiv.org
November 7, 2025 at 3:37 AM
KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering

Enables zero-shot generalization to unseen KGs through LLM-generated relation descriptions and question-conditioned initialization.

📝 arxiv.org/abs/2511.04093
👨🏽‍💻 github.com/yncui-nju/KGFR
KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering
Large language models (LLMs) excel at reasoning but struggle with knowledge-intensive questions due to limited context and parametric knowledge. However, existing methods that rely on finetuned LLMs o...
arxiv.org
November 7, 2025 at 3:36 AM
On the Brittleness of CLIP Text Encoders

Analyzes how CLIP text encoders react to minor input variations in multimedia retrieval, finding that syntactic and semantic perturbations cause the largest instabilities while brittleness concentrates in trivial surface edits.

📝 arxiv.org/abs/2511.04247
On the Brittleness of CLIP Text Encoders
Multimodal co-embedding models, especially CLIP, have advanced the state of the art in zero-shot classification and multimedia information retrieval in recent years by aligning images and text in a sh...
arxiv.org
November 7, 2025 at 3:35 AM
Cache Mechanism for Agent RAG Systems

Introduces an annotation-free caching framework that reduces RAG storage to 0.015% of original corpus while achieving 79.8% has-answer rate and 80% latency reduction.

📝 arxiv.org/abs/2511.02919
Cache Mechanism for Agent RAG Systems
Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's s...
arxiv.org
November 6, 2025 at 3:35 AM
No-Human in the Loop: Agentic Evaluation at Scale for Recommendation

Walmart introduces a multi-agent framework that benchmarks 36 LLMs as judges for complementary-item recommendation without human annotation.

📝 arxiv.org/abs/2511.03051
No-Human in the Loop: Agentic Evaluation at Scale for Recommendation
Evaluating large language models (LLMs) as judges is increasingly critical for building scalable and trustworthy evaluation pipelines. We present ScalingEval, a large-scale benchmarking study that sys...
arxiv.org
November 6, 2025 at 3:35 AM
Generative Sequential Recommendation via Hierarchical Behavior Modeling

Presents a generative framework for multi-behavior sequential recommendation with cross-level interaction and sequential augmentation.

📝 arxiv.org/abs/2511.03155
👨🏽‍💻 github.com/wzf2000/GAMER
Generative Sequential Recommendation via Hierarchical Behavior Modeling
Recommender systems in multi-behavior domains, such as advertising and e-commerce, aim to guide users toward high-value but inherently sparse conversions. Leveraging auxiliary behaviors (e.g., clicks,...
arxiv.org
November 6, 2025 at 3:34 AM