Caglar Gulcehre
banner
caglarai.bsky.social
Caglar Gulcehre
@caglarai.bsky.social
AI Researcher
Prof @ EPFL, Lead @ CLAIRE lab, ELLIS Scholar
Ex: Staff Research Scientist @ Deepmind, MSR, IBM Research
Pinned
We provide an efficient and performant method that provides best of both worlds in a new architecture. We managed to show that our approach scales better than the SOTA transformers with self-attention.

Incredible execution and attention to details by @xiuyingwei.bsky.social !
⚡️🧠 Excited to share our recent work on long-context efficiency! We propose a new layer called RAT—fast and lightweight like RNNs, yet powerful like Attention. 🐭✨ This is the joint effort with Anunay Yadav, @razvan-pascanu.bsky.social @caglarai.bsky.social !
Reposted by Caglar Gulcehre
#IPAM (the institute for pure and applied mathematics) is facing a critical shortfall for operating expenses due to an unexpected suspension of NSF funding www.ipam.ucla.edu/news/nsf-fun... . Donations for emergency continuity of operations funding can be made at

giving.ucla.edu/Campaign/Donat
www.ipam.ucla.edu
August 8, 2025 at 12:48 AM
Reposted by Caglar Gulcehre
🚀 Big time! We can finally do simple LLM RL fine-tuning with rewards and leverage offline/off-policy data!

❌ You want rewards, but GRPO only works online?
❌ You want offline, but DPO is limited to preferences?
✅ QRPO can do both!

🧵Here's how we do it:
July 15, 2025 at 6:45 PM
We provide an efficient and performant method that provides best of both worlds in a new architecture. We managed to show that our approach scales better than the SOTA transformers with self-attention.

Incredible execution and attention to details by @xiuyingwei.bsky.social !
⚡️🧠 Excited to share our recent work on long-context efficiency! We propose a new layer called RAT—fast and lightweight like RNNs, yet powerful like Attention. 🐭✨ This is the joint effort with Anunay Yadav, @razvan-pascanu.bsky.social @caglarai.bsky.social !
July 13, 2025 at 5:37 PM
Reposted by Caglar Gulcehre
Thrilled to announce that our work “Fleet of Agents” has been accepted @icmlconf.bsky.social. On average, FoA boosts quality by ~5% while reducing costs to ~40% of SOTA baselines. Blog post after the Neurips deadline ;)

Until then:
Paper: arxiv.org/abs/2405.066...
Code: github.com/au-clan/FoA
May 11, 2025 at 11:41 PM
Reposted by Caglar Gulcehre
Many thanks to all amazing collaborators that contributed to this project - Amin Mansouri, @lars-quaedvlieg.bsky.social , Amal Seddas, Maryna Viazovska, Emmanuel Abbe, @caglarai.bsky.social

12/12
April 26, 2025 at 4:56 PM
Reposted by Caglar Gulcehre
Excited to share our latest work on EvoTune, a novel method integrating LLM-guided evolutionary search and reinforcement learning to accelerate the discovery of algorithms! 1/12🧵
April 26, 2025 at 4:56 PM
Wohoo🥳 Thrilled to announce this paper 📢. We have shown that it is possible to significantly improve the FunSearch method with RL and achieve impressive algorithmic discoveries on challenging NP-complete combinatorial optimization tasks like TSP and bin-packing.
Excited to share our latest work on EvoTune, a novel method integrating LLM-guided evolutionary search and reinforcement learning to accelerate the discovery of algorithms! 1/12🧵
April 26, 2025 at 5:02 PM
Reposted by Caglar Gulcehre
🚨🚨 24 more hours to register your abstracts for the @grades-nda.bsky.social workshop @sigmod2025.bsky.social

Papers due March 30th 23:59 AoE 🚀

@sdumbrava.bsky.social @olafhartig.bsky.social @csaudk.bsky.social
March 24, 2025 at 12:54 PM
Reposted by Caglar Gulcehre
I am recruiting 2 PhD students for Fall'25 @csaudk.bsky.social to work on bleeding-edge topics in #NLProc #LLMs #AIAgents (e.g. LLM reasoning, knowledge-seeking agents, and more).

Details: www.cs.au.dk/~clan/openings
Deadline: May 1, 2025

Please boost!

cc: @aicentre.dk @wikiresearch.bsky.social
Open positions and projects
### Open semester and Master's projects If you're an AU student looking for a semester project, a Bachelor project, or an MS thesis project, please refer to [this list](projects). ### Prospective PhD ...
www.cs.au.dk
March 18, 2025 at 9:12 AM
Reposted by Caglar Gulcehre
If it turns out LLMs are only capable of recombinatory innovation (finding novel connections among existing knowledge), that would still be very useful. Most innovation is recombination and one of the big issues in science is that fields are too vast for scientists to bridge them to find connections
March 9, 2025 at 6:25 PM
Reposted by Caglar Gulcehre
www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.
TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science
YouTube video by Amii
www.youtube.com
March 6, 2025 at 8:50 PM
Reposted by Caglar Gulcehre
stay tuned for more proper, detailed and exciting cover of this preprint, but whoa i'm so proud of the team @prescientdesign.bsky.social and our achievements on !
February 25, 2025 at 6:10 PM
Reposted by Caglar Gulcehre
And I am an ally. If you are too, let the world know.
February 22, 2025 at 10:14 PM
I have been using Glove80 kb in the last week due to my RSI and it improved significantly since then. But I am still baffled how hard it is to get used to a new kb layout. Oddly, although I type perfectly fine on it now, I can't enter my passwords with it because they are stored in my muscle memory.
February 18, 2025 at 8:47 PM
Reposted by Caglar Gulcehre
Do large language models develop "emergent" models of the world? My latest Substack posts explore this claim and more generally the nature of "world models":

LLMs and World Models, Part 1: aiguide.substack.com/p/llms-and-w...

LLMs and World Models, Part 2: aiguide.substack.com/p/llms-and-w...
LLMs and World Models, Part 1
How do Large Language Models Make Sense of Their “Worlds”?
aiguide.substack.com
February 13, 2025 at 10:30 PM
Reposted by Caglar Gulcehre
Trajan's (@starlord37.bsky.social) story & countless others like it in the face of these cuts and wild shifts in government fellowships intended for our brightest & most promising students will have long-term, deeply damaging effects on the U.S.'s competitiveness in science, math, CS and more.
Last night I found out that the NSF math postdoctoral fellowship I applied for is being deleted because it does not comply with Trump’s executive orders on DEI in the federal government. I’m going to answer some FAQs and share some thoughts about this ordeal in this thread 1/n
February 9, 2025 at 4:47 AM
Reposted by Caglar Gulcehre
A great talk on the history and design decisions in Google's TPUs by my longtime colleague Norm Jouppi, winner of the 2024 Seymour Cray Computer Engineering award.

Talk: www.youtube.com/watch?v=a-1x...

Award announcement: www.computer.org/publications...
SC24 IEEE-CS Seymour Cray Computer Engineering Award
YouTube video by SC Conference Series
www.youtube.com
January 22, 2025 at 11:42 PM
Reposted by Caglar Gulcehre
We’ve been thrilled by the positive reception to Gemini 2.0 Flash Thinking we discussed in December.

Today we’re sharing an experimental update w/improved performance on math, science, and multimodal reasoning benchmarks 📈:
• AIME: 73.3%
• GPQA: 74.2%
• MMMU: 75.4%
January 22, 2025 at 12:31 AM
Reposted by Caglar Gulcehre
Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social
January 13, 2025 at 7:53 PM
Reposted by Caglar Gulcehre
Also, check out our ML project template—it’s a game-changer!🚀🚀
@caglarai.bsky.social
🧑‍💻 github.com/CLAIRE-Labo/...
December 10, 2024 at 7:39 PM
Reposted by Caglar Gulcehre
Ever been puzzled by your PPO agent collapsing out of nowhere? 📈🤯📉 Come check out our poster tomorrow!
Wed 11 Dec 11 am - 2 pm PST
West Ballroom A-D #6403
@caglarai.bsky.social @andreamiele.bsky.social @razvan-pascanu.bsky.social
December 10, 2024 at 6:33 PM
I am in Vancouver for NeurIPS 2024 until December 16th if you want to meet, DM or email me.
We have two accepted papers from my lab:
1. Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers, on Wednesday, East Exhibit Hall A-C #2010 (1/3)
December 9, 2024 at 11:04 PM