Phd student in CS + AI @usc.edu. CS undergrad, master at ShanghaiTech. LLM reasoning, RL, AI4Science.
We open-source everything for full reproducibility:
Paper: arxiv.org/abs/2506.09967
Blog: shangshangwang.notion.site/resa
Code: github.com/shangshang-w...
Model: huggingface.co/Resa-Yi
Training Logs: wandb.ai/upup-ashton-...
We open-source everything for full reproducibility:
Paper: arxiv.org/abs/2506.09967
Blog: shangshangwang.notion.site/resa
Code: github.com/shangshang-w...
Model: huggingface.co/Resa-Yi
Training Logs: wandb.ai/upup-ashton-...
The trained model is transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling transfer across datasets and models.
The trained model is transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling transfer across datasets and models.
In SAE-Tuning, the SAE first “extracts” latent reasoning features and then guides a standard supervised fine-tuning process to “elicit” reasoning abilities.
In SAE-Tuning, the SAE first “extracts” latent reasoning features and then guides a standard supervised fine-tuning process to “elicit” reasoning abilities.
Paper: arxiv.org/abs/2504.15777
Notion Blog: shangshangwang.notion.site/tina
Code: github.com/shangshang-w...
Model: huggingface.co/Tina-Yi
Training Logs: wandb.ai/upup-ashton-...
Tina's avatar is generated by GPT-4o based on KYNE's girls.
Paper: arxiv.org/abs/2504.15777
Notion Blog: shangshangwang.notion.site/tina
Code: github.com/shangshang-w...
Model: huggingface.co/Tina-Yi
Training Logs: wandb.ai/upup-ashton-...
Tina's avatar is generated by GPT-4o based on KYNE's girls.
💡 We further provide insights based on our observations during post-training Tina models.
💡 We further provide insights based on our observations during post-training Tina models.
shangshangwang.notion.site/llm-reasoning
shangshangwang.notion.site/llm-reasoning
docs.google.com/forms/d/e/1F...
docs.google.com/forms/d/e/1F...
We then collect survey, evaluation, benchmark and application papers and also online resources like blogs, posts, videos, code, and data.
We then collect survey, evaluation, benchmark and application papers and also online resources like blogs, posts, videos, code, and data.
Verifiers serve as a key component in both post-training (e.g., as reward models) and test-time compute (e.g., as signals to guide search). Our fourth section collects thoughts on various verification.
Verifiers serve as a key component in both post-training (e.g., as reward models) and test-time compute (e.g., as signals to guide search). Our fourth section collects thoughts on various verification.
Test-time compute is an emerging field where folks are trying different methods (e.g., search) and using extra components (e.g., verifiers). Our third section classifies them based on the optimization targets for LLMs.
Test-time compute is an emerging field where folks are trying different methods (e.g., search) and using extra components (e.g., verifiers). Our third section classifies them based on the optimization targets for LLMs.
Our second section collects thoughts on post-training methods for LLM reasoning including the hot RL-based and also SFT-based methods.
Our second section collects thoughts on post-training methods for LLM reasoning including the hot RL-based and also SFT-based methods.
The discussion on reasoning ability started to go viral with the release of OpenAI o-series and DeepSeek R1 models. Our first section collects thoughts on OpenAI o-series and DeepSeek R1 models and other SOTA reasoning models.
The discussion on reasoning ability started to go viral with the release of OpenAI o-series and DeepSeek R1 models. Our first section collects thoughts on OpenAI o-series and DeepSeek R1 models and other SOTA reasoning models.
shangshangwang.notion.site/llm-reasoning
shangshangwang.notion.site/llm-reasoning