Ivan Cherevko
ichrvk.bsky.social
Ivan Cherevko
@ichrvk.bsky.social
Founder at http://Maracas.ai, Chief Privacy Officer @ Yandex | ex: Founder and CEO, Hotelscan, CPO Yandex Direct, CPO&CTO of Rambler&Co projects
So we're just casually dropping models that beat both Llama and state-of-the-art memory models by massive margins now? Looking forward to the independent replications of these impressive LM2 results.
February 16, 2025 at 7:33 PM
Turns out you can teach LLMs to reason with just 17k examples and the content doesn't even need to be correct. The real secret? Just keep the reasoning structure coherent. Makes you wonder about all those carefully curated datasets.
February 16, 2025 at 6:53 PM
Finally, a paper that tells us when distillation beats supervised learning without having to run a million experiments. TL;DR: Use distillation for small compute budgets or when you already have a teacher, otherwise stick to supervised learning.
February 16, 2025 at 3:11 AM
Huginn: Smaller model that gets smarter by thinking longer in latent space. Finally someone trying something other than just making models bigger or forcing them to write their thoughts down like a middle schooler.
February 13, 2025 at 11:50 PM
Jobs disappeared due to high interest rates, but AI means they're not coming back. This decade will be tough for junior software engineers.
February 13, 2025 at 2:46 AM
Mathematical reasoning in LLMs has always been seen as data-hungry, requiring 100k+ examples. New paper shows just 817 carefully curated samples can achieve 57.1% on AIME. If reproducible, this challenges everything we thought we knew about scaling laws in reasoning tasks.
February 11, 2025 at 10:44 PM
DeepMind's AlphaGeometry2 solves 84% of IMO geometry problems from 2000-2024, surpassing gold medalist performance. The system nearly doubles its predecessor's 54% solve rate through improved language modeling, faster symbolic reasoning, and novel search techniques.
February 8, 2025 at 3:53 AM
Comprehensive study shows that long chain-of-thought reasoning in LLMs emerges from SFT + RL but requires careful reward shaping to control length scaling. Rule-based reward signals with filtering outperform model-based approaches for stabilizing performance.
February 7, 2025 at 3:43 AM
New research from Stanford and UW shows how to achieve o1-preview-level performance for test-time scaling with just 1K examples and simple supervised training. The key? Careful data curation and sensible compute budgeting.
February 5, 2025 at 12:13 AM
Interesting find: OpenAI's Deep Research web browser has JavaScript disabled by default, but the model can enable it when needed. Cool to see the technical decisions behind this project.
February 4, 2025 at 11:39 PM
Teaching is just the latest job being displaced by AI. Six weeks of LLM tutoring now matches two years of traditional education.
January 17, 2025 at 7:49 AM
Why do we keep using these corny stock photos of milk-white robots with exposed brains to symbolize AI? Kubrick gave us the perfect visual metaphor back in 1969.
January 12, 2025 at 8:36 PM
NVIDIA releases Cosmos - an ambitious attempt at a general-purpose physical world simulator. Open source with permissive licenses, but still struggling with basic physics like object permanence. Nice milestone for robotics and autonomous driving, even if we're far from a true digital twin of reality
January 10, 2025 at 10:33 PM
Small LLMs (1.5B-7B) achieving o1-level math reasoning through iterative self-improvement is impressive, but let's see independent verification and stress testing before getting too excited about the 90% MATH score claims.
January 10, 2025 at 3:45 PM
Project Digits is killer. The Mac Mini-like form factor is a huge plus — I can just keep it sitting on my desk until I need its power.
January 7, 2025 at 4:59 PM
SDPO paper is out - turns out the secret to making LLMs more socially intelligent isn't better data or larger models, but simply optimizing the right chunks of conversation. Sometimes less really is more.
January 7, 2025 at 3:27 PM
OLMo 2 proves open-source LLMs can compete: 7B & 13B models matching/beating Llama 3.1 & Qwen 2.5 while using fewer FLOPs. Best part? EVERYTHING is open - models, data, code, logs, the painful training stability debugging... because science needs more than just weights. @soldaini.net
January 6, 2025 at 4:58 PM
FLOAT: A new method for AI talking heads that's literally 125x faster than current approaches, with preserved quality and realistic expressions. Flow matching shows its strengths over diffusion models yet again.
December 9, 2024 at 12:30 PM
Babe, wake up - @sirbayes.bsky.social just dropped an amazing reinforcement learning tutorial! I honestly thought RL was gonna be a passing fad, but every month I see more important use cases. This is a perfect intro if you want to learn.
December 9, 2024 at 11:56 AM
Interesting paper from Google DeepMind on 'motion prompting' for video gen. Clean technical approach with nice results, though the 12-min generation time makes me wonder about real-world viability beyond research demos. @tobiaspfaff.bsky.social @andrewowens.bsky.social
December 9, 2024 at 10:39 AM
Finally, someone figured out how to do 4D reconstruction without needing a million synchronized cameras. Impressive results on both real and AI-generated videos while keeping the technical approach surprisingly elegant.
@benmpoole.bsky.social @jonbarron.bsky.social @holynski.bsky.social
December 2, 2024 at 4:52 PM