Al_th
alth.fr
Al_th
@alth.fr
Applied Math Ph.D, R&D engineer (Image processing, numerical modeling, Machine Learning) in the healthcare sector. #MLsky

Also cooking (Pâté en croûte maker) and slowly learning guitar.

Alth.fr
@althcuisine on Instagram
@AlthCuisine on YouTube

FR/EN
Merci du retour ! Tu as un lien pour le parrainage ?
November 19, 2025 at 9:25 PM
Poivre blanc du penja, c’est un de mes préféré si jamais tu as l’occasion
November 15, 2025 at 11:43 AM
Que vaut la vanille ?

J’ai vu les mêmes pub sur Instagram et vu le prix j’avoue que j’étais un peu refroidi… j’ai cru à une arnaque
November 15, 2025 at 11:42 AM
Vivre… vivre… c’est un grand mot.

Y’a des jours c’est de la survie 🤣
March 3, 2025 at 6:36 AM
Some time ago, I DM'd @dorialexander.bsky.social about a similar (yet somewhat diff) idea :

While there is a point in fixing the generated tokens, we do squash enormous amount of information by actually looking if the cat is dead or alive.

AFAIK, the issue with diff is fixed context size tho
February 19, 2025 at 8:45 AM
The new policy logprob computation seems a bit clunky for now.

It's currently generic enough to use any generation length in the grpo output generation step, but I guess it would be much more efficient to generate only a context size chunk and use the fact that you have the full logits available...
February 6, 2025 at 10:14 AM
github.com/Al-th/grpo_e...

I hope it's a reasonable implementation...

Tokenizer and Transformer models are very naive, based on Karpathy's transformer from scratch video. Data is also based on Karpathy's video.
GitHub - Al-th/grpo_experiment: Experiment on reimplementation of GRPO RL
Experiment on reimplementation of GRPO RL . Contribute to Al-th/grpo_experiment development by creating an account on GitHub.
github.com
February 6, 2025 at 10:00 AM
Probably can share that yeah

Needs a bit of cleanup first but I’ll ping you.
February 5, 2025 at 7:53 PM
To be fair, the GRPO optimized model doesnt shout, the RL cheated by having more people speak (as names are capitalized in the dataset I'm using)

(Left is base transformer, right is post GRPO)
February 5, 2025 at 5:22 PM
Vu les niveaux de radioactivité rapportés : 2.7->26.4 avec une médiane de 14.4 Bq/kg, honnêtement je suis pas expert mais je pense qu’on peut dire « vu et s’en tape »…

J’ai bien aimé ce passage aussi « These values are 10ˆ8 times lower than levels authorized by EU (55) (3.10−3 mSv day−1) »
February 3, 2025 at 9:14 PM
Reposted by Al_th
2/2

Cette conclusion provient de plusieurs types d'analyses combinées (géochimie, granulométrie, minéralogie des argiles, activités des radionucléides et de leur signature isotopique, rétro-trajectoires des masses d’air...)

Source @cnrs.bsky.social INSU : www.insu.cnrs.fr/fr/cnrsinfo/...
Poussières sahariennes : la radioactivité ne provient pas des essais nucléaires menés par la France
Les poussières désertiques représentent la première source mondiale en masse d’aérosols dans l’atmosphère.
www.insu.cnrs.fr
February 3, 2025 at 8:50 PM
Dans mon job, en interne donc, ça fait deux ans qu’une décision doit être prise. Je craque.

Et pendant ce temps, obviously, le contexte change, les concurrents avancent, ect…
February 2, 2025 at 7:32 AM
Is it really challenging conventional AI wisdom though ?

It is know for quite a bit of time that training data quality is one of the most important factor when working with supervised algorithms, even though the real world data might be noisy.

Isn’t it the same but in the RL environment ?
January 30, 2025 at 7:01 AM
Tbh I don’t think any of it is (in case this was what you implied) a shift in cultural behavior.

In my view, it’s more the manifestation of the economical benefit: you are the first, you don’t disclose to keep your advantage. You are not, then open sourcing can hurt the top player.
January 28, 2025 at 7:23 AM