Lightnews — Scholar-powered news

Cosmin Stamate

@stamate.bsky.social

560 followers 4.4K following 67 posts

AI & ML Scientist | Researcher • Engineer • Lecturer

Posts Replies Media Videos

Cosmin Stamate

@stamate.bsky.social

⚙️ The Core Idea

They call any layer that can read a separate context plus a query a “contextual layer”.

Stack this layer on top of a normal multilayer perceptron and you get a “contextual block”.

For that block, the context acts exactly like a rank 1 additive patch on the

--- ...

July 25, 2025 at 8:13 AM

Cosmin Stamate

@stamate.bsky.social

ICML’s Statement about subversive hidden LLM prompts

We live in a weird timeline…

July 23, 2025 at 1:32 PM

Cosmin Stamate

@stamate.bsky.social

🚨 The era of infinite internet data is ending, So we ask:

👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?

TL;DR:

▶️Compute-constrained? Train Autoregressive models

▶️Data-constrained? Train Diffusion models

Get ready for 🤿 1/n

--- ...

July 23, 2025 at 12:52 PM

Cosmin Stamate

@stamate.bsky.social

🚨 Finding #1: Diffusion models outperform autoregressive models when trained with sufficient compute (i.e., more epochs & parameters).

Across different unique data scales, we observe:

1️⃣ At low compute, Autoregressive models win.
2️⃣ After a certain amount of compute,

--- ...

July 23, 2025 at 12:09 PM

Cosmin Stamate

@stamate.bsky.social

Everyone get your top 1% quality dataset and train 100 epochs right now

---

paper : https://arxiv.org/abs/2507.15857

July 23, 2025 at 11:56 AM

Cosmin Stamate

@stamate.bsky.social

Anthropic just released a research paper.

Inverse Scaling in Test-Time Compute

This study shows that longer reasoning in Large Reasoning Models (LRMs) can hurt performance—revealing a surprising inverse scaling between reasoning length and accuracy. ...

July 23, 2025 at 10:17 AM

Cosmin Stamate

@stamate.bsky.social

Companies are using fake humans and AI to do interviews now...

July 23, 2025 at 10:13 AM

Cosmin Stamate

@stamate.bsky.social

🔄 DeepSeek-R1 is now MIT licensed for clear open access
🔓 Open for the community to leverage model weights & outputs
🛠️ API outputs can now be used for fine-tuning & distillation

January 20, 2025 at 3:08 PM

Cosmin Stamate

@stamate.bsky.social

Deepseek just published their R1 repo, have fun everyone!!

January 20, 2025 at 12:38 PM

Cosmin Stamate

@stamate.bsky.social

Bluesky now has over 10 million users, and I was #1,116,727!

September 19, 2024 at 4:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news