Sumit
banner
reachsumit.com
Sumit
@reachsumit.com
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain!

Blog: https://blog.reachsumit.com/
Newsletter: https://recsys.substack.com/
Pinned
I published Vol. 143 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/semantic-s...
recsys.substack.com
Reversing the Retrieval Engine: Query Performance Prediction as an Inverse Learning Task

Introduces an inverse learning framework that reconstructs latent retrieval features from performance scores.

📝 dl.acm.org/doi/10.1145/...
👨🏽‍💻 github.com/recherche198...
Reversing the Retrieval Engine: Query Performance Prediction as an Inverse Learning Task | ACM Transactions on Information Systems
Query performance prediction (QPP) aims to estimate the effectiveness of search queries in the absence of explicit relevance judgments. Existing approaches typically rely on hand-crafted features or l...
dl.acm.org
February 16, 2026 at 4:24 AM
Hi-SAM: A Hierarchical Structure-Aware Multi-modal Framework for Large-Scale Recommendation

Introduces a disentangled semantic tokenizer and hierarchical memory-anchor transformer for multi-modal recommendations.

📝 arxiv.org/abs/2602.11799
Hi-SAM: A Hierarchical Structure-Aware Multi-modal Framework for Large-Scale Recommendation
Multi-modal recommendation has gained traction as items possess rich attributes like text and images. Semantic ID-based approaches effectively discretize this information into compact tokens. However,...
arxiv.org
February 16, 2026 at 4:23 AM
An Industrial-Scale Sequential Recommender for LinkedIn Feed Ranking

LinkedIn presents Feed SR, a transformer-based sequential ranking model achieving +2.10% time spent improvement through RoPE embeddings, incremental training, and GPU optimizations.

📝 arxiv.org/abs/2602.12354
An Industrial-Scale Sequential Recommender for LinkedIn Feed Ranking
LinkedIn Feed enables professionals worldwide to discover relevant content, build connections, and share knowledge at scale. We present Feed Sequential Recommender (Feed-SR), a transformer-based seque...
arxiv.org
February 16, 2026 at 4:22 AM
Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

Achieves up to 4x throughput improvement in visual document retrieval by reducing vectors per page through model-aware spatial pooling

📝 arxiv.org/abs/2602.12510
👨🏽‍💻 github.com/Ara-Yeroyan/...
Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search
Multi-vector visual retrievers (e.g., ColPali-style late interaction models) deliver strong accuracy, but scale poorly because each page yields thousands of vectors, making indexing and search increas...
arxiv.org
February 16, 2026 at 4:22 AM
DiffuRank: Effective Document Reranking with Diffusion Language Models

Introduces diffusion language models for document reranking with three approaches (pointwise, logits-based listwise, and permutation-based listwise).

📝 arxiv.org/abs/2602.12528
👨🏽‍💻 github.com/liuqi6777/Di...
DiffuRank: Effective Document Reranking with Diffusion Language Models
Recent advances in large language models (LLMs) have inspired new paradigms for document reranking. While this paradigm better exploits the reasoning and contextual understanding capabilities of LLMs,...
arxiv.org
February 16, 2026 at 4:18 AM
Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

Tencent presents a framework that optimizes LLM reasoning for recommendation through reinforcement learning.

📝 arxiv.org/abs/2602.12530
Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation
Recommender systems are tasked to infer users' evolving preferences and rank items aligned with their intents, which calls for in-depth reasoning beyond pattern-based scoring. Recent efforts start to ...
arxiv.org
February 16, 2026 at 4:17 AM
CAPTS: Channel-Aware, Preference-Aligned Trigger Selection for Multi-Channel Item-to-Item Retrieval

Kuaishou presents a framework that improves multi-channel retrieval by aligning trigger selection with downstream engagement rather than direct feedback.

📝 arxiv.org/abs/2602.12564
CAPTS: Channel-Aware, Preference-Aligned Trigger Selection for Multi-Channel Item-to-Item Retrieval
Large-scale industrial recommender systems commonly adopt multi-channel retrieval for candidate generation, combining direct user-to-item (U2I) retrieval with two-hop user-to-item-to-item (U2I2I) pipe...
arxiv.org
February 16, 2026 at 4:16 AM
RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction

Introduces probabilistic modeling via Gaussian Mixture Models combined with residual quantization for multimodal embedding discretization.

📝 arxiv.org/abs/2602.12593
RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction
Multimodal content is crucial for click-through rate (CTR) prediction. However, directly incorporating continuous embeddings from pre-trained models into CTR models yields suboptimal results due to mi...
arxiv.org
February 16, 2026 at 4:15 AM
Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback

Evolves recommendation systems through LLM-driven code optimization, integrating user simulator feedback with model diagnosis.

📝 arxiv.org/abs/2602.12612
👨🏽‍💻 github.com/Sein-Kim/sel...
Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback
Traditional methods for automating recommender system design, such as Neural Architecture Search (NAS), are often constrained by a fixed search space defined by human priors, limiting innovation to pr...
arxiv.org
February 16, 2026 at 4:14 AM
ReFilter: Improving Robustness of Retrieval-Augmented Generation via Gated Filter

Introduces a token-level filtering framework that addresses RAG's scalability bottleneck by suppressing irrelevant content through gated fusion.

📝 arxiv.org/abs/2602.12709
ReFilter: Improving Robustness of Retrieval-Augmented Generation via Gated Filter
Retrieval-augmented generation (RAG) has become a dominant paradigm for grounding large language models (LLMs) with external evidence in knowledge-intensive question answering. A core design choice is...
arxiv.org
February 16, 2026 at 4:13 AM
Training Dense Retrievers with Multiple Positive Passages

Introduces a systematic study of multi-positive optimization objectives for retriever training.

📝 arxiv.org/abs/2602.12727
Training Dense Retrievers with Multiple Positive Passages
Modern knowledge-intensive systems, such as retrieval-augmented generation (RAG), rely on effective retrievers to establish the performance ceiling for downstream modules. However, retriever training ...
arxiv.org
February 16, 2026 at 4:12 AM
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph

Alibaba presents a framework for multimodal RAG that efficiently handles token-heavy visual data in iterative reasoning.

📝 arxiv.org/abs/2602.12735
👨🏽‍💻 github.com/Alibaba-NLP/...
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph
Effectively retrieving, reasoning, and understanding multimodal information remains a critical challenge for agentic systems. Traditional Retrieval-augmented Generation (RAG) methods rely on linear in...
arxiv.org
February 16, 2026 at 4:11 AM
Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Apple introduces an asynchronous LLM-judged caching policy that expands static coverage without changing serving decisions, increasing curated static answer usage by up to 3.9x.

📝 arxiv.org/abs/2602.13165
Asynchronous Verified Semantic Caching for Tiered LLM Architectures
Large language models (LLMs) now sit in the critical path of search, assistance, and agentic workflows, making semantic caching essential for reducing inference cost and latency. Production deployment...
arxiv.org
February 16, 2026 at 4:10 AM
I published Vol. 143 of "Top Information Retrieval Papers of the Week" on Substack.
🔗 recsys.substack.com/p/semantic-s...
recsys.substack.com
February 15, 2026 at 4:48 PM
AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

Proposes a training-free retrieval model using LLM attention layers and entity-based retrieval for context-aware long document retrieval in RAG.

📝 arxiv.org/abs/2602.12278
AttentionRetriever: Attention Layers are Secretly Long Document Retrievers
Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for lon...
arxiv.org
February 13, 2026 at 5:19 AM
Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Defines token overflow in soft compression for RAG and proposes lightweight probing classifiers to detect it without LLM inference.

📝 arxiv.org/abs/2602.12235
👨🏽‍💻 github.com/s-nlp/overfl...
Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation
Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures promise to ...
arxiv.org
February 13, 2026 at 5:17 AM
Query-focused and Memory-aware Reranker for Long Context Processing

Tencent presents a lightweight listwise reranker using attention scores from retrieval heads.

📝 arxiv.org/abs/2602.12192
🤗 huggingface.co/MindscapeRAG...
Query-focused and Memory-aware Reranker for Long Context Processing
Built upon the existing analysis of retrieval heads in large language models, we propose an alternative reranking framework that trains models to estimate passage-query relevance using the attention s...
arxiv.org
February 13, 2026 at 5:17 AM
MTFM: A Scalable and Alignment-free Foundation Model for Industrial Recommendation in Meituan

Meituan presents a transformer-based foundation model using heterogeneous tokenization for multi-scenario recommendation without input alignment.

📝 arxiv.org/abs/2602.11235
MTFM: A Scalable and Alignment-free Foundation Model for Industrial Recommendation in Meituan
Industrial recommendation systems typically involve multiple scenarios, yet existing cross-domain (CDR) and multi-scenario (MSR) methods often require prohibitive resources and strict input alignment,...
arxiv.org
February 13, 2026 at 5:15 AM
KuaiSearch: A Large-Scale E-Commerce Search Dataset for Recall, Ranking, and Relevance

Kuaishou releases the largest e-commerce search dataset with real user queries, covering recall, ranking, and relevance tasks.

📝 arxiv.org/abs/2602.11518
👨🏽‍💻 github.com/benchen4395/...
KuaiSearch: A Large-Scale E-Commerce Search Dataset for Recall, Ranking, and Relevance
E-commerce search serves as a central interface, connecting user demands with massive product inventories and plays a vital role in our daily lives. However, in real-world applications, it faces chall...
arxiv.org
February 13, 2026 at 5:14 AM
Recurrent Preference Memory for Efficient Long-Sequence Generative Recommendation

Tencent introduces a framework that compresses long user interaction histories into compact Preference Memory tokens for efficient generative recommendation.

📝 arxiv.org/abs/2602.11605
Recurrent Preference Memory for Efficient Long-Sequence Generative Recommendation
Generative recommendation (GenRec) models typically model user behavior via full attention, but scaling to lifelong sequences is hindered by prohibitive computational costs and noise accumulation from...
arxiv.org
February 13, 2026 at 5:14 AM
Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems

Bilibili introduces a framework for efficient feature interaction in CTR prediction, achieving 26x fewer parameters

📝 arxiv.org/abs/2602.12041
👨🏽‍💻 github.com/shishishu/MLCC
Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems
Modeling high-order feature interactions efficiently is a central challenge in click-through rate and conversion rate prediction. Modern industrial recommender systems are predominantly built upon dee...
arxiv.org
February 13, 2026 at 5:12 AM
Improving Neural Retrieval with Attribution-Guided Query Rewriting

Uses token-level gradient attributions from retrievers to guide LLM-based query rewriting, improving retrieval effectiveness without retraining the model.

📝 arxiv.org/abs/2602.11841
👨🏽‍💻 github.com/anonym-submi...
Improving Neural Retrieval with Attribution-Guided Query Rewriting
Neural retrievers are effective but brittle: underspecified or ambiguous queries can misdirect ranking even when relevant documents exist. Existing approaches address this brittleness only partially: ...
arxiv.org
February 13, 2026 at 5:10 AM
IntTravel: A Real-World Dataset and Generative Framework for Integrated Multi-Task Travel Recommendation

Alibaba releases a 4.1B interaction dataset and decoder-only generative framework for multi-task travel recommendation.

📝 arxiv.org/abs/2602.11664
👨🏽‍💻 github.com/AMAP-ML/IntT...
IntTravel: A Real-World Dataset and Generative Framework for Integrated Multi-Task Travel Recommendation
Next Point of Interest (POI) recommendation is essential for modern mobility and location-based services. To provide a smooth user experience, models must understand several components of a journey ho...
arxiv.org
February 13, 2026 at 5:10 AM
Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents

Google presents a system where LLM agents autonomously evolve recommender models by generating hypotheses, writing code & validating changes through A/B testing.

📝 arxiv.org/abs/2602.10226
Self-Evolving Recommendation System: End-To-End Autonomous Model Optimization With LLM Agents
Optimizing large-scale machine learning systems, such as recommendation models for global video platforms, requires navigating a massive hyperparameter search space and, more critically, designing sop...
arxiv.org
February 12, 2026 at 5:38 AM
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation

Tencent uses value-guided decoding and sibling-relative advantage learning to improve generative recommendation.

📝 arxiv.org/abs/2602.10699
Spend Search Where It Pays: Value-Guided Structured Sampling and Optimization for Generative Recommendation
Generative recommendation via autoregressive models has unified retrieval and ranking into a single conditional generation framework. However, fine-tuning these models with Reinforcement Learning (RL)...
arxiv.org
February 12, 2026 at 5:37 AM