Lightnews — Scholar-powered news

Moritz Laurer

@moritzlaurer.bsky.social

.@microsoft.com's rStar-Math paper claims that 🤏 ~7B models can match the math skills of o1 using clever train- and test-time techniques. You can now download their prompt templates from @hf.co !
🧵

January 15, 2025 at 12:31 PM

Moritz Laurer

@moritzlaurer.bsky.social

FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!
🧵

January 11, 2025 at 11:14 AM

Moritz Laurer

@moritzlaurer.bsky.social

The TRL v0.13 release is 🔥! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning.

January 9, 2025 at 1:05 PM

Moritz Laurer

@moritzlaurer.bsky.social

OpenAI is losing money on the $200/month subscription 🤯. It's crazy how expensive it is to run these largest LLMs:

- ChatGPT Pro costs $200/month ($2,400/year) and is still unprofitable for OpenAI due to higher-than-expected usage.
- OpenAI reportedly expected losses of about $5 billion

January 7, 2025 at 11:12 AM

Moritz Laurer

@moritzlaurer.bsky.social

🚀 Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:

- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost
- 📉 Performance tradeoff:

January 6, 2025 at 4:40 PM

Moritz Laurer

@moritzlaurer.bsky.social

Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!

This will probably be the basis for many future SOTA encoders! I can finally stop using DeBERTav3 2021 :D

December 20, 2024 at 2:21 PM

Moritz Laurer

@moritzlaurer.bsky.social

"Open-source AI: year in review 2024": amazing Space with lots of data-driven insights into AI in 2024! Check it out 👇

December 17, 2024 at 3:40 PM

Moritz Laurer

@moritzlaurer.bsky.social

I've been building a small library for working with prompt templates on the @huggingface.bsky.social Hub: `pip install prompt-templates`. Motivation:

The community currently shares prompt templates in a wide variety of formats: in datasets, in model cards, as strings in .py files, as .txt/... 🧵

December 12, 2024 at 3:58 PM

Reposted by Moritz Laurer

narsilou.bsky.social

@narsilou.bsky.social

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !

December 10, 2024 at 10:08 AM

Reposted by Moritz Laurer

Damián Pumar

@damianpumar.hf.co

🚀 We’re excited to announce Argilla v2.5.0, which includes:
* Argilla webhooks,
* A new design for the datasets home page.
* Python 3.13 and Pydantic v2 support.
📙 Read here 👇 the full release notes

github.com/argilla-io/a...