Lightnews — Scholar-powered news

Tim Kellogg

@timkellogg.me

7.2K followers 740 following 12K posts

AI Architect | North Carolina | AI/ML, IoT, science

WARNING: I talk about kids sometimes

Posts Replies Media Videos

Tim Kellogg

@timkellogg.me

on the next page it says all chat bots must answer to the name “big brother”

November 11, 2025 at 9:18 PM

Tim Kellogg

@timkellogg.me

i tried doing a startup doing exactly this with KGs, it did not pan out

November 11, 2025 at 4:07 PM

Tim Kellogg

@timkellogg.me

ya that’s where i’m at too. it feels strange watching someone use a sub-par mode

November 11, 2025 at 4:01 PM

Tim Kellogg

@timkellogg.me

my suspicion is that Google doesn’t do this and that they might be the only ones that don’t

November 11, 2025 at 3:19 PM

Tim Kellogg

@timkellogg.me

the longer i think about it, the more i suspect they’re doing something at an even higher level

like instead of dynamic batch sizing, maybe they do constant and just have very smart load balancers that keep load saturated

probably balancing training & serving utilization

November 11, 2025 at 3:18 PM

Tim Kellogg

@timkellogg.me

it didn’t occur to me until now, but monad is in the range of HRM but generalized

November 11, 2025 at 2:37 PM

Tim Kellogg

@timkellogg.me

honestly surprised he didn’t do that years ago

November 11, 2025 at 1:51 PM

Tim Kellogg

@timkellogg.me

ya that was my thought too

November 11, 2025 at 12:32 PM

Tim Kellogg

@timkellogg.me

maybe, but the even “twice” makes me think it was something that more directly adds to 2x perf

November 11, 2025 at 12:31 PM

Tim Kellogg

@timkellogg.me

so sweet

❯ uv run run.py
Loaded PleIAs/Baguettotron on mps. Type 'quit' to exit.
> bro! let's fucking go!
I'm really into you. It's a relationship that gets ridiculed at the same time we're supposed to appreciate it. It's funny, but it does a lot to you.

I'm glad you were asking the right questions. Is there anything else you're curious about?

November 11, 2025 at 3:27 AM

Tim Kellogg

@timkellogg.me

@dorialexander.bsky.social i wish you blessings in the form of billions of euros in funding

November 11, 2025 at 2:43 AM

Tim Kellogg

@timkellogg.me

i’m surprised! i expected them to train in fp32, but no, they went with a legit bf16

November 11, 2025 at 2:41 AM

Tim Kellogg

@timkellogg.me

while being the most French model yet, they had to rationalize why it wasn’t trained on French

but fr imagine being able to do ablations on THE ENTIRE end-to-end training process. you’d learn so much

Monad and Baguettotron were trained on 16 h100 from Jean Zay using the Nanotron framework from HuggingFace. This setting allowed for fast experimentations and iteration, Monad being trained in less than six hours. While Baguettotron reuses the standard Pleias tokenizer optimized for European languages, Monad uses a custom tokenizer trained on the English segment of SYNTH: this was a critical measure to contain parameters space, bringing back token embeddings from 20M to less than 2M.|

November 11, 2025 at 2:36 AM

Tim Kellogg

@timkellogg.me

lol

November 10, 2025 at 11:50 PM

Tim Kellogg

@timkellogg.me

Google will also budget rewrites into their timelines, or so i've heard

November 10, 2025 at 10:31 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news