Lightnews — Scholar-powered news

Shangshang Wang

@shangshang-wang.bsky.social

Curious about the details for these efficiency claims?

We open-source everything for full reproducibility:

Paper: arxiv.org/abs/2506.09967
Blog: shangshangwang.notion.site/resa
Code: github.com/shangshang-w...
Model: huggingface.co/Resa-Yi
Training Logs: wandb.ai/upup-ashton-...

June 12, 2025 at 5:02 PM

Shangshang Wang

@shangshang-wang.bsky.social

SAE-Tuning trains models that match RL-trained counterparts’ performance while reducing costs by >2000x and time by >450x.

The trained model is transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling transfer across datasets and models.

June 12, 2025 at 5:02 PM

Shangshang Wang

@shangshang-wang.bsky.social

Such efficiency stems from our novel SAE-Tuning method, which expands the use of SAEs beyond test-time steering.

In SAE-Tuning, the SAE first “extracts” latent reasoning features and then guides a standard supervised fine-tuning process to “elicit” reasoning abilities.

June 12, 2025 at 5:02 PM

Shangshang Wang

@shangshang-wang.bsky.social

Check out more about Tina following the links down below.

Paper: arxiv.org/abs/2504.15777
Notion Blog: shangshangwang.notion.site/tina
Code: github.com/shangshang-w...
Model: huggingface.co/Tina-Yi
Training Logs: wandb.ai/upup-ashton-...

Tina's avatar is generated by GPT-4o based on KYNE's girls.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

We also want to express our gratitude to the broader open-source community. This research was made possible by leveraging numerous publicly available resources from DeepScaleR, STILL, OpenThoughts @bespokelabs.bsky.social , OpenR1 @hf.co , LIMR, and OpenRS.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

This is an amazing collaboration with Julian, Omer, Enes, and Oliver @oliu-io.bsky.social in the course taught by Willie @willieneis.bsky.social (both the teacher and the advisor) Thanks everyone!

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[9/9] 🚀 We thus hypothesize that LoRA’s effectiveness and efficiency stem from rapidly adapting the reasoning format under RL while preserving base model knowledge, a likely more compute-efficient process than the deep knowledge integration of full-parameter training.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[8/9] 💡 Observation 2) We consistently observe a training phase transition in the format-related metrics (format reward, completion length) but NOT accuracy-related metrics across most Tina models. And the best-performance checkpoint is always found around this transition point.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[7/9] 💡 Observation 1) We observe that in Tina models, increased training compute inversely affects performance, in contrast to full-parameter models. This observation highlights a “less compute can yield more performance” phenomenon.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

🤔 But why? Where does this effectiveness and efficiency come from?

💡 We further provide insights based on our observations during post-training Tina models.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[6/9] 😋 And, it costs only $9 to reproduce the best Tina checkpoint, $526 to reproduce all our experiments from scratch!

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[5/9] 🤩 We validate this across multiple open-source reasoning datasets and various ablation settings with a single, fixed set of hyperparameters, confirming the effectiveness and efficiency of LoRA-based RL.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[4/9] 😍 With minimal post-training compute, the best Tina checkpoint achieves >20% performance increase over the base model and 43% Pass@1 on AIME24.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[3/9] 🚀 Our Tina models compete with, and sometimes surpass, SOTA models built on the same base model with a surprising high cost efficiency.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

[2/9] 👩 We release the Tina family of models, created by post-training the DeepSeek-R1-Distill-Qwen-1.5B base model using low-rank adaptation (LoRA) during reinforcement learning (RL), on open-source reasoning datasets.

April 23, 2025 at 5:10 PM

Shangshang Wang

@shangshang-wang.bsky.social

Click on the link/card below to see the full set of spreadsheets:
shangshangwang.notion.site/llm-reasoning

LLM Reasoning: Curated Insights

Reasoning ability is gained via post-training and is scaled via test-time compute.

shangshangwang.notion.site

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

6/6 This is an ongoing collection of LLM reasoning. Please feel free to send new materials or any feedback here or via the google form.
docs.google.com/forms/d/e/1F...

Feedback on "LLM Reasoning: Curated Insights"

docs.google.com

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

5/6 Others Artifacts

We then collect survey, evaluation, benchmark and application papers and also online resources like blogs, posts, videos, code, and data.

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

4/6 Verification, The Key to Reasoning

Verifiers serve as a key component in both post-training (e.g., as reward models) and test-time compute (e.g., as signals to guide search). Our fourth section collects thoughts on various verification.

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

3/6 Test-Time Compute: Scaling Reasoning Ability

Test-time compute is an emerging field where folks are trying different methods (e.g., search) and using extra components (e.g., verifiers). Our third section classifies them based on the optimization targets for LLMs.

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

2/6 Post-Training: Gain Reasoning Ability

Our second section collects thoughts on post-training methods for LLM reasoning including the hot RL-based and also SFT-based methods.

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

1/6 OpenAI, DeepSeek and More

The discussion on reasoning ability started to go viral with the release of OpenAI o-series and DeepSeek R1 models. Our first section collects thoughts on OpenAI o-series and DeepSeek R1 models and other SOTA reasoning models.

February 19, 2025 at 6:01 PM

Shangshang Wang

@shangshang-wang.bsky.social

Click on the link/card below to see the full set of spreadsheets, and check out the thread below for an overview of each section:
shangshangwang.notion.site/llm-reasoning

LLM Reasoning: Curated Insights

Reasoning ability is gained via post-training and is scaled via test-time compute.

shangshangwang.notion.site

February 19, 2025 at 6:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news