Cosmin Stamate
banner
stamate.bsky.social
Cosmin Stamate
@stamate.bsky.social
AI & ML Scientist | Researcher • Engineer • Lecturer
⚙️ The Core Idea

They call any layer that can read a separate context plus a query a “contextual layer”.

Stack this layer on top of a normal multilayer perceptron and you get a “contextual block”.

For that block, the context acts exactly like a rank 1 additive patch on the

--- ...
July 25, 2025 at 8:13 AM
ICML’s Statement about subversive hidden LLM prompts

We live in a weird timeline…
July 23, 2025 at 1:32 PM
🚨 The era of infinite internet data is ending, So we ask:

👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?

TL;DR:

▶️Compute-constrained? Train Autoregressive models

▶️Data-constrained? Train Diffusion models

Get ready for 🤿 1/n

--- ...
July 23, 2025 at 12:52 PM
🚨 Finding #1: Diffusion models outperform autoregressive models when trained with sufficient compute (i.e., more epochs & parameters).

Across different unique data scales, we observe:

1️⃣ At low compute, Autoregressive models win.
2️⃣ After a certain amount of compute,

--- ...
July 23, 2025 at 12:09 PM
Everyone get your top 1% quality dataset and train 100 epochs right now

---

paper : https://arxiv.org/abs/2507.15857
July 23, 2025 at 11:56 AM
Anthropic just released a research paper.

Inverse Scaling in Test-Time Compute

This study shows that longer reasoning in Large Reasoning Models (LRMs) can hurt performance—revealing a surprising inverse scaling between reasoning length and accuracy. ...
July 23, 2025 at 10:17 AM
Companies are using fake humans and AI to do interviews now...
July 23, 2025 at 10:13 AM
🔄 DeepSeek-R1 is now MIT licensed for clear open access
🔓 Open for the community to leverage model weights & outputs
🛠️ API outputs can now be used for fine-tuning & distillation
January 20, 2025 at 3:08 PM
Deepseek just published their R1 repo, have fun everyone!!
January 20, 2025 at 12:38 PM
Bluesky now has over 10 million users, and I was #1,116,727!
September 19, 2024 at 4:51 PM