PhD in ML Theory. Math BS @MIT.
Posting about AI, leadership, and growth.
Let’s build the future together. 🚀
On-device models give you privacy, offline availability, lower latency, and lower cost-to-serve.
Some tasks are simply better local.
I wanted one place to share my work on enterprise AI, leadership, and research. More coming soon.
Built with @framer - the best website building platform I've used so far
I wanted one place to share my work on enterprise AI, leadership, and research. More coming soon.
Built with @framer - the best website building platform I've used so far
A trajectory = (task, action sequence).
Show a model thousands of these, and it becomes extremely good at acting in that domain.
LLMs don’t learn actions abstractly.
They learn from demonstrations.
A trajectory = (task, action sequence).
Show a model thousands of these, and it becomes extremely good at acting in that domain.
LLMs don’t learn actions abstractly.
They learn from demonstrations.
simple queries → small local model
complex reasoning → larger model only when necessary
This hybrid architecture is the future.
simple queries → small local model
complex reasoning → larger model only when necessary
This hybrid architecture is the future.
Layered checks like:
‣ trajectory correctness
‣ function-call execution
‣ goal completion
‣ instruction alignment
Anything that fails gets discarded or corrected.
A good synthetic dataset is mostly filtration.
Layered checks like:
‣ trajectory correctness
‣ function-call execution
‣ goal completion
‣ instruction alignment
Anything that fails gets discarded or corrected.
A good synthetic dataset is mostly filtration.
2023 → barely usable for anything beyond simple tasks.
2025 → crushing intermediate chat tasks, and even doing surprisingly well in difficult chats.
Local AI is heating up 🔥
2023 → barely usable for anything beyond simple tasks.
2025 → crushing intermediate chat tasks, and even doing surprisingly well in difficult chats.
Local AI is heating up 🔥
‣ actions
‣ reasoning steps
‣ navigation patterns
and suddenly it can match or beat much larger models trained on generic data.
The magic isn’t the model size.
It’s the data pipeline.
‣ actions
‣ reasoning steps
‣ navigation patterns
and suddenly it can match or beat much larger models trained on generic data.
The magic isn’t the model size.
It’s the data pipeline.
But for narrow, well-defined tasks, they’re pretty solid.
But for narrow, well-defined tasks, they’re pretty solid.
• Pick people you want to learn from
• Ask 1–2 specific questions
• Follow up
• Let the relationship evolve
That’s it. No formalities required.
• Pick people you want to learn from
• Ask 1–2 specific questions
• Follow up
• Let the relationship evolve
That’s it. No formalities required.
Everything is already inside us.
Experience just brings it into focus.
Everything is already inside us.
Experience just brings it into focus.
Track your contributions and the impact they created.
Your future self (and your promo case) will thank you.
Track your contributions and the impact they created.
Your future self (and your promo case) will thank you.
For privacy-sensitive use cases, that’s a game-changer.
For privacy-sensitive use cases, that’s a game-changer.
Use them as reps for public speaking, teaching, and leading.
Use them as reps for public speaking, teaching, and leading.
Start the conversation.
Start the conversation.
New episode of Welcome to Derry 🎈
New episode of Welcome to Derry 🎈
Managing tasks is <25% of the job.
People, technical clarity, and alignment are the rest.
Managing tasks is <25% of the job.
People, technical clarity, and alignment are the rest.
But our data? Fragmented, noisy, inconsistent.
Introducing PersonaBench, a synthetic, human-like dataset (bios, interests, conversations, purchases) to test how well today's RAG systems handle real user complexity.
But our data? Fragmented, noisy, inconsistent.
Introducing PersonaBench, a synthetic, human-like dataset (bios, interests, conversations, purchases) to test how well today's RAG systems handle real user complexity.
Leadership does too.
Go deeper and it becomes all about clarity, influence, and your impact on people.
Leadership does too.
Go deeper and it becomes all about clarity, influence, and your impact on people.
But that doesn't mean your agent knows how to use those tools.
Tool access ≠ Tool use ability
That's why we created MCPEval (#EMNLP2025), a new framework for evaluating agent performance on any MCP server.
But that doesn't mean your agent knows how to use those tools.
Tool access ≠ Tool use ability
That's why we created MCPEval (#EMNLP2025), a new framework for evaluating agent performance on any MCP server.
The rest is where things get tricky:
The rest is where things get tricky:
No questions. No reactions. Just blank stares.
Try this:
• Talk slower
• Pause for questions
• Skimmable slides
• Tailor to the audience
No questions. No reactions. Just blank stares.
Try this:
• Talk slower
• Pause for questions
• Skimmable slides
• Tailor to the audience
(and no, you don’t need a PhD)
• Learn the bleeding-edge at NeurIPS, ICLR, ICML
• Study how companies deploy AI at places like The AI Conference, Ai4, MLOps
• Implement research papers
• Find or start reading groups
(and no, you don’t need a PhD)
• Learn the bleeding-edge at NeurIPS, ICLR, ICML
• Study how companies deploy AI at places like The AI Conference, Ai4, MLOps
• Implement research papers
• Find or start reading groups
I have to say - they have the best swag 🔥
I have to say - they have the best swag 🔥
On-device models give you privacy, offline availability, lower latency, and lower cost-to-serve.
Some tasks are simply better local.
On-device models give you privacy, offline availability, lower latency, and lower cost-to-serve.
Some tasks are simply better local.