Lightnews — Scholar-powered news

Alexander Doria

@dorialexander.bsky.social

7.5K followers 680 following 1.9K posts

LLM for the commons.

Posts Replies Media Videos

Pinned

Alexander Doria @dorialexander.bsky.social · Nov 10

Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...

Alexander Doria

@dorialexander.bsky.social

As a result, did the long delayed/much needed update of personal web page: vintagedata.org

Alexander Doria @dorialexander.bsky.social · 3d

i guess i’m really an ai researcher now

February 8, 2026 at 3:28 PM

Alexander Doria

@dorialexander.bsky.social

I’m not sure if it’s limited to the EU but sex censorship of the web is getting absolutely ridiculous. 1950s level.

February 6, 2026 at 9:42 PM

Alexander Doria

@dorialexander.bsky.social

i guess i’m really an ai researcher now

February 6, 2026 at 4:42 PM

Alexander Doria

@dorialexander.bsky.social

Wouldn’t have bet +2 years ago on Bender & co getting stuck in a weird cultish mood while EA people counteract very sensibly, but here we are. benthams.substack.com/p/the-ai-con...

February 3, 2026 at 6:27 PM

Alexander Doria

@dorialexander.bsky.social

Meanwhile, further independent evaluation confirms SYNTH performs better than Nanochat/Fineweb across nearly all benchmarks.

(Humaneval to be expected: we did not out any code inside yet)

Tobi @mocutobi.bsky.social · 8d

Outside of HumanEval, the model pretrained on the SYNTH dataset outperforms the standard nanochat on every task. I report results for both the standard chat template (Harmony) as well as the Qwen3 Chat template used in the SYNTH dataset.

February 1, 2026 at 10:50 PM

Alexander Doria

@dorialexander.bsky.social

It took me weeks, but finally it's there: an overlong blogpost on synthetic pretraining. vintagedata.org/blog/posts/s...

February 1, 2026 at 5:53 PM

Alexander Doria

@dorialexander.bsky.social

maybe it's awfully european of me, but i'm not convinced dumping a mass of non-contextualized private documents is ever a good thing

February 1, 2026 at 2:42 PM

Alexander Doria

@dorialexander.bsky.social

seems we’re dangerously close to a point where someone will tweet you can slurp three gas town juices in a moltbot - and kill a few ai bubbles.

February 1, 2026 at 11:56 AM

Alexander Doria

@dorialexander.bsky.social

Common Corpus paper is going to @iclr-conf.bsky.social !

Alexander Doria @dorialexander.bsky.social · Jun 4

Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and published 2 trillion tokens of reusable data for LLM pretraining arxiv.org/pdf/2506.01732

January 26, 2026 at 1:08 PM

Alexander Doria

@dorialexander.bsky.social

If I see another headline about Yann le Cun "contrarian" bet, I’m killing a non-verbal model.

January 25, 2026 at 5:26 PM

Alexander Doria

@dorialexander.bsky.social

Now starting.

From what I see generally compared to Joyce and Kafka but for now the obvious comparison point is Robert Musil (did Musil read it?)

January 25, 2026 at 3:25 PM

Alexander Doria

@dorialexander.bsky.social

If you ever wondered if SYNTH could be usable in mid-training for larger models: Step-DeepResearch (from StepFun) is now out as private beta. stepfun.ai/deep-researc...

January 25, 2026 at 9:11 AM

Alexander Doria

@dorialexander.bsky.social

It’s a bit soon for follow Friday, but still recommending to follow this cat.

xlr8harder @xlr8harder.bsky.social · 19d

I'm going to crosspost between X and Bsky a bit to try to find my people.

Apologies to those seeing repeats.

January 22, 2026 at 1:10 PM

Alexander Doria

@dorialexander.bsky.social

Sans surprise. C'est probablement le secteur le plus exposé à l'IA agentique : plein d'entreprises sont en train de découvrir qu'elles peuvent ré-internaliser la génération de code et que la dette technique n'est pas pire que du SaaS mal contrôlé.

Pixels | Le Monde @pixelsfr.bsky.social · 21d

Capgemini envisage de supprimer jusqu’à 2 400 postes en France

Le numéro 1 français du conseil en informatique affirme vouloir « se transformer » face aux mutations technologiques, notamment celles liées à l’intelligence artificielle.

www.lemonde.fr

January 20, 2026 at 11:25 AM

Alexander Doria

@dorialexander.bsky.social

Baguettotron is now taught in the classroom.

January 19, 2026 at 11:00 PM

Alexander Doria

@dorialexander.bsky.social

maybe model collapse is back after all: claude starting to cite grokipedia (obviously flat wrong…)

January 18, 2026 at 10:11 PM

Alexander Doria

@dorialexander.bsky.social

Meanwhile… I’m afraid Greenland will be the least of our concerns.

January 17, 2026 at 6:39 PM

Alexander Doria

@dorialexander.bsky.social

sometimes you look at the data and regret it.

January 17, 2026 at 4:22 PM

Alexander Doria

@dorialexander.bsky.social

Really like this retrospective of how the original Qwen series came to be: "could you also make a 3-4b for us".

January 16, 2026 at 8:46 PM

Alexander Doria

@dorialexander.bsky.social

Following on our partnership with Wikimedia Enterprise, very happy to see Pleias featured in Wikipedia 25th anniversary post. wikimediafoundation.org/news/2026/01...

January 16, 2026 at 6:59 PM

Alexander Doria

@dorialexander.bsky.social

and so, back to train something new

January 15, 2026 at 1:12 PM

Alexander Doria

@dorialexander.bsky.social

The real shift I'm seeing with Claude Code & co: "programming" is now mostly about knowledge infrastructure management.

January 13, 2026 at 5:19 PM

Alexander Doria

@dorialexander.bsky.social

I feel now structuralism is now getting comforted a bit of everywhere: obviously linguistic/culture but also even archeology/comparative mythology. Except almost no one is left to claim it.

January 13, 2026 at 12:06 PM

Alexander Doria

@dorialexander.bsky.social

not your usual pull request rejection.

January 4, 2026 at 8:38 PM

Alexander Doria

@dorialexander.bsky.social

I'm afraid many left/liberals remind me of that classic Lubitsch dialogue (from Cluny Brown)

January 4, 2026 at 3:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news