Lightnews — Scholar-powered news

Claudio Fanconi

@fanconic.bsky.social

Our paper from the ICML 2025 workshop on Multi-Agent Systems explores a cascaded LLM framework for cost-effective decision-making. Small models defer to larger ones—or abstain for human input—based on uncertainty.

Paper: arxiv.org/abs/2506.11887
#LLM #MultiAgent #ICML2025

July 21, 2025 at 12:05 PM

Claudio Fanconi

@fanconic.bsky.social

Our paper from the ICML 2025 workshop on Models of Human Feedback (Oral presentation) proposes few-shot steerable alignment for LLMs, utilising neural processes to handle conflicting and unobserved preferences.

Paper: arxiv.org/abs/2412.13998

#LLM #AIAlignment #ICML2025

July 21, 2025 at 12:02 PM

Claudio Fanconi

@fanconic.bsky.social

To all NeurIPS warriors out there, good luck with the final sprint. Just one more day, we got this! :)

May 15, 2025 at 8:37 AM

Reposted by Claudio Fanconi

Unsloth AI

@unsloth.ai

We teamed up with 🤗Hugging Face to release a free notebook for fine-tuning Gemma 3 with GRPO

Learn to:
• Enable reasoning in Gemma 3 (1B)
• Prepare/understand reward functions
• Make GRPO work for tiny LLMs

Notebook: colab.research.google.com/github/unslo...
Details: huggingface.co/reasoning-co...

March 19, 2025 at 4:31 PM

Claudio Fanconi

@fanconic.bsky.social

Have a listen to Chris and Robert speak about our DiscoPOP paper (arxiv.org/abs/2406.08414) in the Machine Learning Street Talk (MLST) Podcast:

youtube.com/watch?v=1kwb...

Can AI Improve Itself?

YouTube video by Machine Learning Street Talk

youtube.com

March 9, 2025 at 12:33 PM

Reposted by Claudio Fanconi

Daniel van Strien

@danielvanstrien.bsky.social

dolphin-r1: a dataset for training R1-style models

- 800k total samples dataset similar in composition to the data used to train DeepSeek-R1 Distill models.
- 300k from DeepSeek-R1
- 300k from Gemini 2.0 flash thinking
- 200k from Dolphin chat

huggingface.co/datasets/cog...

cognitivecomputations/dolphin-r1 · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

January 30, 2025 at 9:21 AM

Claudio Fanconi

@fanconic.bsky.social

Hot take: High-Flyer (the quant shop behind DeepSeek) orchestrated the release and hype of DeepSeek-R1 carefully, and made the big bugs today with shorts on NVDA🤑

January 27, 2025 at 9:09 PM

Reposted by Claudio Fanconi

Tim Kellogg

@timkellogg.me

huggingface is doing a fully open source replication of R1 github.com/huggingface/...

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

github.com

January 25, 2025 at 2:31 PM

Reposted by Claudio Fanconi

Yijia Shao

@echoshao8899.bsky.social

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? 🤖➕👤

Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmations or my deep thinking.(🧵 with video)

January 17, 2025 at 5:44 PM

Reposted by Claudio Fanconi

Nathan Lambert

@natolambert.bsky.social

Qwen released a 72B process reward model (PRM) on their recent math model. A good chance it's the best PRM openly available for reasoning research. We like Qwen.
https://buff.ly/4gQV9wt

January 14, 2025 at 3:56 AM

Reposted by Claudio Fanconi

Thomas Wolf

@thomwolf.bsky.social

Yes!

January 8, 2025 at 10:50 PM

Reposted by Claudio Fanconi

Nathan Lambert

@natolambert.bsky.social

Here are the slides for our language modeling tutorial with @kylelo.bsky.social and @akshitab.bsky.social in west ballroom b (ongoing).

docs.google.com/presentation...

[10 December 2024, NeurIPs] Tutorial on Language Modeling

Language Modeling Kyle Lo – Akshita Bhagia – Nathan Lambert Allen Institute of AI olmo@allenai.org Neural Information Processing Systems (NeurIPS) 10 December 2024 1

docs.google.com

December 10, 2024 at 6:29 PM

Claudio Fanconi

@fanconic.bsky.social

I am at #NeurIPS this week, and presenting our poster "Discovering Preference Optimization Algorithms with and for Large Language Models" on Thursday at East Exhibit Hall A-C #3304.

Let me know if you want to chat about alignment, LLMs, and AI applications in medicine!

arxiv.org/abs/2406.08414

Discovering Preference Optimization Algorithms with and for Large Language Models

Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervis...

arxiv.org

December 10, 2024 at 4:23 PM

Reposted by Claudio Fanconi

Alexander Kolesnikov

@kolesnikov.ch

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

December 4, 2024 at 9:14 AM

Reposted by Claudio Fanconi

Laura

@lauraruis.bsky.social

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️

November 20, 2024 at 4:35 PM

Reposted by Claudio Fanconi

Felix Petersen

@petersen.ai

Excited to share our #NeurIPS 2024 Oral, Convolutional Differentiable Logic Gate Networks, leading to a range of inference efficiency records, including inference in only 4 nanoseconds 🏎️. We reduce model sizes by factors of 29x-61x over the SOTA. Paper: arxiv.org/abs/2411.04732

November 17, 2024 at 4:34 PM

Claudio Fanconi

@fanconic.bsky.social

🎉First #NeurIPS paper: We used LLMs to discover new preference optimization algorithms through evolution - they propose code, train a separate LLM & use the performance feedback to improve. After some generations, it found multiple novel, high-performing objective functions! arxiv.org/pdf/2406.08414

November 25, 2024 at 2:05 PM

Reposted by Claudio Fanconi

Jason Weston

@jasonweston.bsky.social

🚨 Adaptive Decoding via Latent Preference Optimization 🚨
- New layer for Transformer, selects decoding params automatically *per token*
- Learnt via new method Latent Preference Optimization
- Outperforms any fixed temperature decoding, choosing creativity or factuality
arxiv.org/abs/2411.09661
🧵1/4

November 22, 2024 at 1:06 PM

Reposted by Claudio Fanconi

Jack Hessel

@jmhessel.bsky.social

LLMs generate novel word sequences not contained in their pretraining data. However, compared to humans, models generate significantly fewer novel n-grams.

RLHF = 30% *more* copying than base!

Awesome work from the awesome Ximing Lu (gloriaximinglu.github.io) et al. 🤩

arxiv.org/pdf/2410.04265

A screenshot from the linked paper's figure 1. The figure is a pretty-complicated three column figure, but --- in essence, it sketches out how the authors compare llm sequences to the pretraining data / human authors to the pretraining data. Humans write more novel n-gram sequences.

November 22, 2024 at 6:14 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news