Flaviu Cipcigan
flaviucipcigan.bsky.social
Flaviu Cipcigan
@flaviucipcigan.bsky.social
Building AIs for scientific discovery. Discovered antibiotics and materials for carbon capture. Tango dancer. See more at flaviucipcigan.com. Opinions my own.
Pinned
One of my big motivations is accelerating science with AI.

Every discovery project had a beautiful aha moment, such as the structure of antibiotics emerging in the latent space of a model or a GFlowNet proposing new carbon capture materials.

Here's some of the threads I've wrote on this topic.
Super interesting application of program search

Goals are mapped to programs which are embedded in a latent space.

A fitness metric is assigned to the programs and program search is done to synthesise new human-like goals.
February 22, 2025 at 11:53 AM
One of my big motivations is accelerating science with AI.

Every discovery project had a beautiful aha moment, such as the structure of antibiotics emerging in the latent space of a model or a GFlowNet proposing new carbon capture materials.

Here's some of the threads I've wrote on this topic.
February 20, 2025 at 9:09 PM
Wanna try to guess which of those gets parsed as a string and which as a number? Answer in alt text.

YAML parsing in python is weird.
February 17, 2025 at 4:49 PM
Interesting idea to generate responses using diffusion rather than left-to-right auto-regressive models
February 17, 2025 at 12:31 PM
What is large for a language model? Is it 400B, 70B or maybe 1T?

I think focus on raw number of parameters is a less useful frame than thinking about inference speed, cost and location of inference (on-device vs cloud).
February 15, 2025 at 12:56 PM
More open reasoning datasets and distilled models.

It's great to see the energy of the community that got unleashed after open models that generate chains of thought!
We are releasing OpenThinker-32B, the best 32B reasoning model with open data. We match or outperform Deepseek-R1-32B (a closed data model) in reasoning benchmarks. Congrats to Negin and the whole Open Thoughts team.

github.com/open-thought...
February 13, 2025 at 3:58 PM
ColabFit Exchange is another great dataset curation effort that I'd like to boost.

Great work by @stemartiniani.bsky.social and team to curate the most diverse materials database in the world!
Join us for the #AI4Mat workshop at #NeurIPS2024 today and check out our spotlight on how we built the most diverse database for AI for materials in the world openreview.net/forum?id=b8q...
February 13, 2025 at 1:53 PM
Neat idea! Fine-tuning using majority voting and length filtering generalises a model's capabilities.

Models generalise to slightly harder versions of a problem, and the correct answers are used to bootstrap the next model and the next one and so on.
February 13, 2025 at 1:17 PM
Join us in creating open datasets, benchmarks and leaderboards for materials discovery.
🚀 𝐋𝐞𝐌𝐚𝐭𝐞𝐫𝐢𝐚𝐥 𝐜𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 𝐦𝐞𝐞𝐭𝐢𝐧𝐠𝐬 𝐚𝐫𝐞 𝐥𝐢𝐯𝐞!⁣⁣
𝖩𝗈𝗂𝗇 𝗎𝗌 𝐞𝐯𝐞𝐫𝐲 𝐬𝐞𝐜𝐨𝐧𝐝 𝐓𝐡𝐮𝐫𝐬𝐝𝐚𝐲 𝐨𝐟 𝐭𝐡𝐞 𝐦𝐨𝐧𝐭𝐡 𝗍𝗈 𝖾𝗑𝗉𝗅𝗈𝗋𝖾 𝖠𝖨-𝖽𝗋𝗂𝗏𝖾𝗇 𝗆𝖺𝗍𝖾𝗋𝗂𝖺𝗅𝗌 𝖽𝗂𝗌𝖼𝗈𝗏𝖾𝗋𝗒.⁣⁣
📅 𝐅𝐞𝐛 𝟏𝟑 | 𝟔𝐏𝐌 𝐏𝐚𝐫𝐢𝐬 | 𝟗𝐀𝐌 𝐋𝐀⁣⁣
📍 𝐉𝐨𝐢𝐧 𝗁𝗍𝗍𝗉𝗌://𝗆𝖾𝖾𝗍.𝗀𝗈𝗈𝗀𝗅𝖾.𝖼𝗈𝗆/𝗆𝗐𝗒-𝗎𝗒𝖽𝖽-𝗄𝗏𝖿⁣
𝖣𝗂𝗏𝖾 𝗂𝗇𝗍𝗈 𝖫𝖾𝖬𝖺𝗍𝖾𝗋𝗂𝖺𝗅 & 𝗌𝗁𝖺𝗉𝖾 𝗍𝗁𝖾 𝖿𝗎𝗍𝗎𝗋𝖾!⁣⁣
👉 𝐂𝐡𝐞𝐜𝐤 𝐭𝐡𝐞 𝐜𝐨𝐦𝐦𝐞𝐧𝐭𝐬 𝐭𝐨 𝐣𝐨𝐢𝐧 𝐭𝐡𝐞 𝐋𝐞𝐌𝐚𝐭𝐞𝐫𝐢𝐚𝐥 𝐒𝐥𝐚𝐜𝐤
February 13, 2025 at 10:17 AM
Superb work!
Announcing the release of Common Corpus 2. The largest fully open corpus for pretraining comes back better than ever: 2 trillion tokens with document-level licensing, provenance and language information. huggingface.co/datasets/Ple...
PleIAs/common_corpus · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
February 12, 2025 at 11:50 AM
The most durable motivation for research is curiosity, the desire to answer a question or understand something.

Curiosity then leads you down a maze of existing answers and new questions.

Eventually, you get to one that has no answer and then you start pushing at the frontier.
February 12, 2025 at 10:48 AM
Interesting - 57.1% AIME24 and 94.8% MATH performance achieved using only 817 reasoning chains and STF.

Adds more weight to the hypothesis that correct reasoning chains and SFT can lead to strong reasoning performance.
GitHub - GAIR-NLP/LIMO: LIMO: Less is More for Reasoning
LIMO: Less is More for Reasoning. Contribute to GAIR-NLP/LIMO development by creating an account on GitHub.
github.com
February 9, 2025 at 7:15 PM
I've been reflecting today about OpenAI's five levels to measure progress in AI.

GPT-4 was at Level 1, conversational AI: a model competent at 0.1-1s tasks, like holding a conversation.

O1 / R1 reached Level 2, reasoners: a model solving 1-10min tasks such as basic coding tasks and math.
February 9, 2025 at 11:52 AM
Agreed, we have the minimum viable scene.

We now just need to amplify each other and keep going.
I have been in your place several times since 2022 waiting patiently.

What I can say is this recent wave (from Sept 2024 to now) has been absolutely the most successful.

I think we have the nucleus. Now, we persist. Don't give up and keep contributing.

We have to play the long game.
February 6, 2025 at 2:33 PM
What if inference scaling is as simple as

response.replace("</think>", "Wait")
s1: Simple inference-time scaling

This is a simple small-scale replication of inference-time scaling

It was cheap: 16xH100 for 26 minutes (so what, ~$6?)

It replicates inference-time scaling using SFT only (no RL)

Extremely data frugal: 1000 samples

arxiv.org/abs/2501.19393
February 5, 2025 at 5:24 PM
SWE arena is going to be an interesting leaderboard to watch.

It allows people to compare the code generated by LMs based on runs inside a sandbox.
SWE Arena: Compare & Test Best AI Chatbots for Code
swe-arena.com
February 5, 2025 at 4:54 PM
every time i try uv, I'm more impressed.

seems now like a tool that Just Works, reducing the complexity of the python ecosystem

installed a cuda+torch+git packages and it all felt basically instant
February 4, 2025 at 11:52 AM
DeepSeek-R1 has turned into such a Rorschach test for the collective psyche
January 28, 2025 at 6:06 PM
Indeed, not outsourcing reasoning is an important value to ... well... reason about.

How would we achieve this?

It may require many individuals and groups to do RL on their own models, using their own verifiers.

This may look like grading exams - not of students, but of ML models.
I definitely start from default pessimism on this.

But just to look at the other pan of the scales: we could plausibly justify outsourcing CMS and email. But if we fully outsource reasoning ... that's it, game over, everyone can go home.

So it *should* be easier to get faculty to care about this.
January 26, 2025 at 8:38 PM
Seeing A Film for the Future in 360 was a special experience.

One of the most powerful parts was We Pray.

The video and music match so well, hit hard, and resonate strongly with the times.
Coldplay - WE PRAY (A Film For The Future)
YouTube video by Coldplay
youtu.be
January 26, 2025 at 6:18 PM
Turning the temperature up using R1

Starting to think

gibberish gibberish gibberish

Focus again. Calm up.

🤣
January 25, 2025 at 6:44 PM
Hm, using reasoning models really feels qualitatively different (using @openrouter.bsky.social for inference).

It's fun to see these aha moments and it'd be interesting to understand whether their presence helps.
January 25, 2025 at 12:26 PM
Huh, interesting, Claude 3.5 sonnet seems to do hidden CoT in the app.

Could not reproduce with the API tho.
January 22, 2025 at 4:03 PM
Deepseek-R1 thread to gather thoughts and reactions

Nice to see the technical details and MIT license for something that looks at o1 level 🥳
GitHub - deepseek-ai/DeepSeek-R1
Contribute to deepseek-ai/DeepSeek-R1 development by creating an account on GitHub.
github.com
January 20, 2025 at 2:11 PM
Interesting result re evolutionary algos for inference time search
January 20, 2025 at 11:45 AM