Lightnews — Scholar-powered news

Graphcore Research

@gcresearchteam.bsky.social

Finally, Metacognitive Reuse: Turning Recurring LLM Reasoning into Concise Behaviors enables LLMs to extract and reuse concise reasoning “behaviors” to improve efficiency and reduce repeated computation.

Summary: graphcore-research.github.io/papers-of-th...

September Papers: The L in ML Stands for LLMs

For September, the research team reviewed a whopping 22 papers! Needless to say, competition was fierce, and only four made the final cut for this month’s edition, which is LLM-themed:

graphcore-research.github.io

October 9, 2025 at 8:49 AM

Graphcore Research

@gcresearchteam.bsky.social

Set Block Decoding accelerates LLM inference by generating multiple tokens in parallel using non-causal attention and iterative entropy-based sampling.

Summary: graphcore-research.github.io/papers-of-th...

September Papers: The L in ML Stands for LLMs

For September, the research team reviewed a whopping 22 papers! Needless to say, competition was fierce, and only four made the final cut for this month’s edition, which is LLM-themed:

graphcore-research.github.io

October 9, 2025 at 8:49 AM

Graphcore Research

@gcresearchteam.bsky.social

Soft Tokens, Hard Truths proposes using continuous “soft” tokens with injected noise to enable reinforcement learning fine-tuning of LLM reasoning.

Summary: graphcore-research.github.io/papers-of-th...

September Papers: The L in ML Stands for LLMs

For September, the research team reviewed a whopping 22 papers! Needless to say, competition was fierce, and only four made the final cut for this month’s edition, which is LLM-themed:

graphcore-research.github.io

October 9, 2025 at 8:49 AM

Graphcore Research

@gcresearchteam.bsky.social

First up, FlowRL uses GFlowNets to train LLMs on full reward distributions, promoting diverse reasoning paths instead of just reward maximisation.

Summary: graphcore-research.github.io/papers-of-th...

September Papers: The L in ML Stands for LLMs

For September, the research team reviewed a whopping 22 papers! Needless to say, competition was fierce, and only four made the final cut for this month’s edition, which is LLM-themed:

graphcore-research.github.io

October 9, 2025 at 8:48 AM

Graphcore Research

@gcresearchteam.bsky.social

Finally, Graph-R1 is another addition to the stack of agentic RAG approaches, but this time, using knowledge hypergraphs!

Summary: graphcore-research.github.io/papers-of-th...

August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG

August, even with its heat waves and holidays, left no shortage of exciting research. Our top papers for this month are the following: ADMIRE-BayesOpt that investigates how to weight different data...

graphcore-research.github.io

September 10, 2025 at 3:25 PM

Graphcore Research

@gcresearchteam.bsky.social

Next, Guiding Diffusion Models with RL for Stable Molecule Generation introduces reinforcement learning with physical feedback to accomplish exactly as its name suggests!

Summary: graphcore-research.github.io/papers-of-th...

August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG

August, even with its heat waves and holidays, left no shortage of exciting research. Our top papers for this month are the following: ADMIRE-BayesOpt that investigates how to weight different data...

graphcore-research.github.io

September 10, 2025 at 3:25 PM

Graphcore Research

@gcresearchteam.bsky.social

First up, ADMIRE-BayesOpt addresses the question of finding the optimal mixture of multiple datasets. And the answer, sequential iterative search using Multi-Fidelity Bayesian Optimization!

Summary: graphcore-research.github.io/papers-of-th...

August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG

August, even with its heat waves and holidays, left no shortage of exciting research. Our top papers for this month are the following: ADMIRE-BayesOpt that investigates how to weight different data...

graphcore-research.github.io

September 10, 2025 at 3:25 PM

Graphcore Research

@gcresearchteam.bsky.social

Finally, DataRater addresses dataset quality: a ‘rater’ is meta-learned to curate training data without manual filtering.

Summary: graphcore-research.github.io/papers-of-th...

July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation

As July brought tennis at Wimbledon, so too did the ML world serve up a volley of research. This month, we took an eagle-eyed approach—or, perhaps, Hawk Eyed approach—to three papers.

graphcore-research.github.io

August 6, 2025 at 10:41 AM

Graphcore Research

@gcresearchteam.bsky.social

Mixture of Recursions brings a twist to token-level computation: the model learns to recurse adaptively, allocating compute per token dynamically.

Summary: graphcore-research.github.io/papers-of-th...

July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation

As July brought tennis at Wimbledon, so too did the ML world serve up a volley of research. This month, we took an eagle-eyed approach—or, perhaps, Hawk Eyed approach—to three papers.

graphcore-research.github.io

August 6, 2025 at 10:41 AM

Graphcore Research

@gcresearchteam.bsky.social

First up, Subliminal Learning explores a question in model distillation: “Can we control so that a student learns desirable, but avoids undesirable, traits?”

Summary: graphcore-research.github.io/papers-of-th...

July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation

As July brought tennis at Wimbledon, so too did the ML world serve up a volley of research. This month, we took an eagle-eyed approach—or, perhaps, Hawk Eyed approach—to three papers.

graphcore-research.github.io

August 6, 2025 at 10:41 AM

Graphcore Research

@gcresearchteam.bsky.social

It turns out your quantisation centroids should distribute according to the cube root of the data pdf (a result from the '50s). Surprising, if you ask me. To find out more, read our post graphcore-research.github.io/posts/cube-r...

Optimal Formats and the Cube Root of the PDF

Your boss emails you a point in 128-billion-dimensional space. “Llama 3.1 8B,” the message reads. “A not-so-large language model in bfloat16. But it’s too big. Trim the fat (ASAP).” You open up your t...

graphcore-research.github.io

June 12, 2025 at 11:20 AM

Graphcore Research

@gcresearchteam.bsky.social

Finally, Spurious Rewards find that even rewarding random answers can improve reasoning abilities, potentially forcing us to reconsider how we understand post-training techniques to improve the use of test-time compute.

Summary: graphcore-research.github.io/papers-of-th...

May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning

Hurtling past the NeurIPS submission deadline into the summer months, we switch from huddling around server rooms to keep warm to babysitting experiments whilst basking in the sun. We’ve had a bumper ...

graphcore-research.github.io

June 4, 2025 at 1:22 PM

Graphcore Research

@gcresearchteam.bsky.social

Soft Thinking introduces a novel way of utilising continuous concept tokens during the reasoning phase of test-time compute models, without requiring any further training.

Summary: graphcore-research.github.io/papers-of-th...

May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning

Hurtling past the NeurIPS submission deadline into the summer months, we switch from huddling around server rooms to keep warm to babysitting experiments whilst basking in the sun. We’ve had a bumper ...

graphcore-research.github.io

June 4, 2025 at 1:22 PM

Graphcore Research

@gcresearchteam.bsky.social

Next up, AlphaEvolve is an evolutionary algorithm from Google DeepMind that generates and refines prompts for Gemini that can advance the state-of-the-art in algorithm design.

Summary: graphcore-research.github.io/papers-of-th...

May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning

Hurtling past the NeurIPS submission deadline into the summer months, we switch from huddling around server rooms to keep warm to babysitting experiments whilst basking in the sun. We’ve had a bumper ...

graphcore-research.github.io

June 4, 2025 at 1:22 PM

Graphcore Research

@gcresearchteam.bsky.social

First, Parallel Scaling Laws for Language Models proposes a novel method of scaling compute with language models inspired by classifier-free guidance that fine tunes a model to run multiple forward passes with different learned vector prefixes.

Summary: graphcore-research.github.io/papers-of-th...

May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning

Hurtling past the NeurIPS submission deadline into the summer months, we switch from huddling around server rooms to keep warm to babysitting experiments whilst basking in the sun. We’ve had a bumper ...

graphcore-research.github.io

June 4, 2025 at 1:22 PM

Graphcore Research

@gcresearchteam.bsky.social

References:

Shannon (1948). "A mathematical theory of communication"
Panter and Dite (1951). "Quantization distortion in pulse-count modulation with nonuniform spacing of levels"
Zador (1982). "Asymptotic Quantization Error of Continuous Signals and the Quantization Dimension"

May 22, 2025 at 12:27 PM

Graphcore Research

@gcresearchteam.bsky.social

That's all for the quick tour, thanks for joining us! To find out more, check out the paper: arxiv.org/abs/2505.12988

(References ⬇️)

6/6

Optimal Formats for Weight Quantisation

Weight quantisation is an essential technique for enabling efficient training and deployment of modern deep learning models. However, the recipe book of quantisation formats is large and the formats a...

arxiv.org

May 22, 2025 at 12:27 PM

Graphcore Research

@gcresearchteam.bsky.social

The weighted objective means that some tensors are more sensitive to quantisation - we can find out how sensitive by measuring the average diagonal Fisher information.

For a given budget, we can get better performance by allocating more bits to more sensitive tensors.

5/6

May 22, 2025 at 12:27 PM

Graphcore Research

@gcresearchteam.bsky.social

It seems that block-absmax, sparse outliers and lossless compression all exploit variable-length coding to some extent.

This is an illustrative image of how that might work.

4/6

May 22, 2025 at 12:26 PM

Graphcore Research

@gcresearchteam.bsky.social

We can use classical quantisation theory (Shannon 1948, Panter and Dite 1951, Zador 1982) to find optimal quantisers.

The best ones use variable-length encoding (e.g. a Huffman code). Interestingly, block-absmax formats can also outperform fixed-length codes.

3/6

May 22, 2025 at 12:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news