Cosmin Stamate
banner
stamate.bsky.social
Cosmin Stamate
@stamate.bsky.social
AI & ML Scientist | Researcher • Engineer • Lecturer
⚙️ The Core Idea

They call any layer that can read a separate context plus a query a “contextual layer”.

Stack this layer on top of a normal multilayer perceptron and you get a “contextual block”.

For that block, the context acts exactly like a rank 1 additive patch on the

--- ...
July 25, 2025 at 8:13 AM
ICML’s Statement about subversive hidden LLM prompts

We live in a weird timeline…
July 23, 2025 at 1:32 PM
🚨 The era of infinite internet data is ending, So we ask:

👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?

TL;DR:

▶️Compute-constrained? Train Autoregressive models

▶️Data-constrained? Train Diffusion models

Get ready for 🤿 1/n

--- ...
July 23, 2025 at 12:52 PM
🚨 Finding #1: Diffusion models outperform autoregressive models when trained with sufficient compute (i.e., more epochs & parameters).

Across different unique data scales, we observe:

1️⃣ At low compute, Autoregressive models win.
2️⃣ After a certain amount of compute,

--- ...
July 23, 2025 at 12:09 PM
Everyone get your top 1% quality dataset and train 100 epochs right now

---

paper : https://arxiv.org/abs/2507.15857
July 23, 2025 at 11:56 AM
opencode making a pong game in vite+react using (4bit)
qwen/qwen3-235b-a22b-2507 locally, served by lmstudio.
It used like 130GB of RAM, 0 issues with tool calls.

This is completely usable locally now. ...
Creating Pong game in Vite React project
opencode.ai
July 23, 2025 at 11:51 AM
Anthropic just released a research paper.

Inverse Scaling in Test-Time Compute

This study shows that longer reasoning in Large Reasoning Models (LRMs) can hurt performance—revealing a surprising inverse scaling between reasoning length and accuracy. ...
July 23, 2025 at 10:17 AM
Companies are using fake humans and AI to do interviews now...
July 23, 2025 at 10:13 AM
Best model with an OSI-approved license:

🇨🇳: R1, Qwen3

🇪🇺: Mistral Small

🇺🇸: IBM Granite
July 23, 2025 at 10:11 AM
Reposted by Cosmin Stamate
Thanks for sharing your journey, @moji249.bsky.social! As I said on Twitter, it's really important for people to see that grad school has its ups and downs and that there are real times of struggle. Congratulations on the graduation! 🎉
June 28, 2025 at 5:33 AM
🔄 DeepSeek-R1 is now MIT licensed for clear open access
🔓 Open for the community to leverage model weights & outputs
🛠️ API outputs can now be used for fine-tuning & distillation
January 20, 2025 at 3:08 PM
Deepseek just published their R1 repo, have fun everyone!!
January 20, 2025 at 12:38 PM
Reposted by Cosmin Stamate
Our tutorial on cross-disciplinary insights on alignment is tomorrow

neurips.cc/virtual/2024...
NeurIPS Tutorial Cross-disciplinary insights into alignment in humans and machinesNeurIPS 2024
neurips.cc
December 10, 2024 at 5:35 AM
Reposted by Cosmin Stamate
Random student shows up on Friday evening: prof, I didn’t get xyz. Can you explain again?
Me: it’s Friday evening.
Student: *puppy eyes*
Me: okay, okay, let me fetch some coloured markers. 🎨🖌️🖼️
November 23, 2024 at 2:16 AM
So I can ask ML questions here and people that actually understand and care will reply?
November 19, 2024 at 8:28 PM
Bluesky now has over 10 million users, and I was #1,116,727!
September 19, 2024 at 4:51 PM