Lightnews — Scholar-powered news

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 25/25: To wrap up the our advent series, we'd like thank the contributors shown below, and to the many others who support the PyTerrier ecosystem! #WorldChangersTogether

December 25, 2025 at 7:56 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 24/25: Removing low-quality docs can boost search quality and cut indexing costs. Our SIGIR’24 paper QT5 trains a T5 model to filter passages at indexing time—easy to integrate, and works with dense, PISA, or SPLADE indexes too.

December 24, 2025 at 9:23 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 23/25: You’ve done retrieval, but the results seem too homogeneous. Use a diversification reranker. Shown below is the implicit MMR diversification approach, instantiated on a BM25 or dense retrieval, but even an explicit approach like xQuAD (c.f. Rodrygo Santos) is easy to write.

December 23, 2025 at 11:54 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 22/25: A more complex pipeline—knowledge-graph–enhanced RAG from our EMNLP 2024 paper TRACE. We build a KG over retrieved docs, then use a transformer to reason over triples for better QA. This pipeline instantiation uses a cache (see 20th advent) on LLM-based KG extraction.

December 22, 2025 at 12:25 PM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 21/25: Bounded recall blues got you down? You can use Adaptive Retrieval techniques, like GAR, LADR, and LAFF, to efficiently surface missing relevant documents.

December 21, 2025 at 11:25 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 20/25: You can think of every PyTerrier transformer as a function - mapping from one dataframe type to another. This makes them easily cachable, courtesy of pyterrier_caching. We have cache object for retrievers, rerankers, or even indexing-time transformations (e.g. Doc2Query)

December 20, 2025 at 10:54 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 19/25: PyTerrier-RAG brings agentic RAG to your workflows with support for SOTA methods like Search-R1 and R1-Searcher, to combine retrievers and reasoning. You could even swap BM25 out for dense or LSR retriever.

December 19, 2025 at 10:24 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 18/25: In RAG, the reader runs the LLM—but your pipeline shouldn’t depend on the LLM stack.

PyTerrier-RAG separates Reader from Backend, letting you swap vLLM ↔ HF with one line while keeping the same pipeline (and even share a Backend with other stages).

December 18, 2025 at 2:23 PM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 17/25: FlexIndex simplifies dense retrieval by supporting FAISS, Voyager, FlatNav & more. It auto-builds data structures, reuses vector stores to cut storage costs, and offers one familiar API for many retrievers.

December 17, 2025 at 2:30 PM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 16/25: Speaking of Learned Sparse Retrieval, PyTerrier has bindings to two backend search engines that provide blazing-fast retrieval over sparse vectors: PISA and BMP.

You can see that we really work to keep the look-and-feel uniform between implementations

December 16, 2025 at 9:32 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 15/25: A very well-known learned sparse method is SPLADE. Our pyt_splade plugin makes it easy to use SPLADE by formulating Terrier indexing & retrieving pipelines that are composed with a SPLADE encoder, adding extra columns (e.g. query_toks).

Try it 👉 github.com/cmacdonald/p...

December 15, 2025 at 10:51 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 14/25: So we’ve seen sparse and dense retrieval in PyTerrier. Some folk recommend hybrid retrieval – e.g. reciprocal rank fusion (RRF) of sparse and dense results. We have an easy pipeline component that combine two sets of results by RRF (w/ a pretty schematic by Sean MacAvaney)

December 14, 2025 at 6:31 PM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 10/25: Dense retrieval often improves with pseudo-relevance feedback (Rocchio-style).

In PyTerrier_DR it’s easy: encode query, retrieve docs, a transformer to mix doc vectors w/ the query vector, and then re-retrieve.
pyterrier.readthedocs.io/en/latest/ex...

December 10, 2025 at 10:17 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 13/25: Doc2query expands docs with generated queries, but can hallucinate. Our ECIR’23 paper Doc2query-- (aka "minus minus") filters generated queries using a cross-encoder before indexing.
PyTerrier pipeline: generate→score→filter→index.
📄https://arxiv.org/pdf/2301.03266

December 13, 2025 at 11:19 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 12/25: Beyond dense retrieval, learned sparse methods like Doc2Query expand docs with predicted queries before indexing. Our pyterrier_doc2query plugin makes this easy for any corpus—perfectly intuitive as PyTerrier’s pipelines can be applied at indexing time too!

December 12, 2025 at 10:18 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 11/25: Want to use an external search services with PyTerrier? No problemo! It has integrations with APIs for Semantic Scholar, ChatNoir (thanks to Jan Heinrich Merker!), Pinecone, and others!

December 11, 2025 at 9:53 AM

Glasgow IR Group

@irglasgow.bsky.social

🎄 PyTerrier Advent 10/25: Dense retrieval often improves with pseudo-relevance feedback (Rocchio-style).

In PyTerrier_DR it’s easy: encode query, retrieve docs, a transformer to mix doc vectors w/ the query vector, and then re-retrieve.
pyterrier.readthedocs.io/en/latest/ex...

December 10, 2025 at 10:17 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 9/25: Yesterday—dense retrieval with E5 via PyTerrier_DR. Today—RAG! PyTerrier_RAG readers generate answers from retrieved docs. Example: a FiD reader over E5 results, w/ & w/o monoT5 reranking 👇 Check the full notebook to see impact on answer quality.
github.com/terrierteam/...

December 9, 2025 at 10:53 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 8/25: Beyond sparse! PyTerrier_dr adds dense indexing & retrieval. Instantiate an encoder model, compose with FlexIndex. Retrieval is identical. Support models include: ANCE, TCT-ColBERT, BGE, E5, or any SentenceTransformer model.
👉 github.com/terrierteam/...

December 8, 2025 at 8:40 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 6/25: Lets see some more neural rerankers: MonoT5 & DuoT5 (DuoT5 is a more costly pairwise reranker).

We use PyTerrier's succinct rank cutoff operator (%) to cut the nbr of retrieved docs reranked by each of the MonoT5 & DuoT5 stages.

Try it out:
github.com/terrierteam/...

December 6, 2025 at 11:36 AM

Reposted by Glasgow IR Group

Glasgow IR Group

@irglasgow.bsky.social

🎄PyTerrier Advent 5/25: Comparing pipelines (e.g., BM25 vs RM3)? Use pt.Experiment(baseline=…) to see MAP gains/losses per query and paired t-test p-values. Multiple-test corrections are supported too.
📄https://pyterrier.readthedocs.io/en/latest/experiments.html#significance-testing

December 5, 2025 at 9:47 AM

Reposted by Glasgow IR Group

Javier Sanz-Cruzado

@javiersanzcruza.bsky.social

Paper alert 🚨"Investors Are (Not) Always Right: A Comparison of Transaction-Based and Profitability-Based Metrics for Financial Asset Recommendations", with @richardmcc.bsky.social, Nikos Droukas, @craigmacdonald.bsky.social & @iadhounis.bsky.social has been accepted at ACM TOIS! 🧵(1/5)

December 3, 2025 at 1:29 PM

Reposted by Glasgow IR Group

UofGNews

@uofgnews.bsky.social

PyTerrier, a software platform developed at
@uofgcompsci.bsky.social which helps facilitate the development of AI-powered search engines, has won a national award from @wearebcs.bsky.social!

Read more here: www.gla.ac.uk/news/headlin...

Dr Sean MacAvaney receives the Best Search Project of the Year Award at the BCS Search Industry Awards on Wednesday 26 November.

December 2, 2025 at 10:47 AM

Glasgow IR Group

@irglasgow.bsky.social

🎄We want to try something new and fun this year – an “Advent Calendar” of PyTerrier pipelines 🤓

We’ll kick it off with *the* baseline: BM25 on MSMARCO. One line to download a pre-built index, one line to make a BM25 retriever, one line to search.

December 1, 2025 at 10:16 PM

Glasgow IR Group

@irglasgow.bsky.social

We are happy to sponsor the Keith Van Rijsbergen (KvR) Award at #ECIR2026

📢 𝗡𝗼𝗺𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀 𝗮𝗿𝗲 𝗻𝗼𝘄 𝗼𝗽𝗲𝗻!

Details and Submission Form at @ecir2026.eu website: lnkd.in/emecdsXm

🗓 𝗗𝗲𝗮𝗱𝗹𝗶𝗻𝗲: 𝟭𝟱 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟮𝟬𝟮𝟲

www.linkedin.com/posts/glasgo...

Keith van Rijsbergen Award | Glasgow Information Retrieval Group

𝗞𝘃𝗥 𝗔𝘄𝗮𝗿𝗱 @ 𝗘𝗖𝗜𝗥 𝟮𝟬𝟮𝟲 We are proud to sponsor the Keith Van Rijsbergen (KvR) Award at ECIR 2026. Keith van Rijsbergen is a pioneer of modern Information Retrieval, with a long history at the Univers...

www.linkedin.com

November 28, 2025 at 10:34 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news