Dani Solà
dani-sola.com
Dani Solà
@dani-sola.com
Interested in people, distributed systems, sustainability, and all things data.
Reposted by Dani Solà
How to Sell Data Modeling
Making the Invisible Visible
practicaldatamodeling.substack.com
November 17, 2025 at 4:27 PM
Reposted by Dani Solà
⏰ Last chance to register for #CDSM2025!

Don't miss your chance to join us Nov 12–13 for two days of talks & debates at the intersection of causality, data science & AI.

💻 Online | 🎟️ Free
👉 causalscience.org
November 8, 2025 at 9:05 AM
Reposted by Dani Solà
If I’m being honest, I’m feeling pretty crap about my small business.

It’s so bloody difficult at the moment with rising costs, US tariffs, Brexit nonsense and the threat of AI.

Please have a look at what I do and repost to spread the word gailmyerscough.co.uk
September 2, 2025 at 5:45 PM
Reposted by Dani Solà
Our didactic review on machine learning for causal inference, now open access:
• identifiability (theory of when the data can answer a causal question)
• machine-learning estimators
• study design (asking well-framed questions + loopholes, eg with timewise data)
www.annualreviews.org/content/jour...
August 20, 2025 at 7:12 PM
Reposted by Dani Solà
Deep Agents

this is a great 10 min video that’s absolutely worth your time

Deep Agent = planning tool (TODO lists) + subagents + filesystem + long detailed system prompt

seems like a deconstruction of why Claude Code works so well

www.youtube.com/watch?v=433S...
What are Deep Agents?
YouTube video by LangChain
www.youtube.com
August 3, 2025 at 11:47 AM
Recommended watch. Although the war is not as often in the news any more, help is as important as ever. And we can all have a direct impact on defending democracy.
Today, July 15th — on the Day of Ukrainian Statehood — No Sleep Til Kyiv is available to watch online!

Wondering where? On our website!
www.nosleeptilkyiv.com/watch-the-fi...

Ticket price: $10
50% of the proceeds go directly to support the @69thsniffingbrigade
July 19, 2025 at 9:08 PM
Reposted by Dani Solà
I like this take by @kentbeck.com on how AI-assisted programming changes the balance of which skills are most important

From this interview with @gergely.pragmaticengineer.com newsletter.pragmaticengineer.com/p/tdd-ai-age...
June 22, 2025 at 4:29 PM
Reposted by Dani Solà
No signs of an end to rapid gains in AI ability at ever-decreasing costs, yet

I did my best to update my chart to take into account the price drop in o3 & new models released by Google

GPT-4 was released 2.25 ago, so its worth noting the trend when considering the future of AI capabilities & cost
June 18, 2025 at 2:34 AM
Reposted by Dani Solà
How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents?

Coordinated Progress is a 4 part series that explores the common structure behind reliable distributed systems.

jack-vanlightly.com/blog/2025/6/...
Coordinated Progress – Part 1 – Seeing the System: The Graph — Jack Vanlightly
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it ...
jack-vanlightly.com
June 11, 2025 at 2:29 PM
I tested ChatGPT, Claude, and Mistral on a multimodal problem about a washing machine. ChatGPT emerged the winner. #ai #chatgpt #claude #mistral
May 13, 2025 at 7:50 PM
Reposted by Dani Solà
Today, we’re announcing the preview release of ty, an extremely fast type checker and language server for Python, written in Rust.

In early testing, it's 10x, 50x, even 100x faster than existing type checkers. (We've seen >600x speed-ups over Mypy in some real-world projects.)
May 13, 2025 at 5:00 PM
Reposted by Dani Solà
New blogpost: "Training as we know it might end".
It was originally a panorama of the new methods of synthetic generation but the stakes are now much higher and I openly wonder if model training is not soon going to change forever. vintagedata.org/blog/posts/t...
May 8, 2025 at 8:02 PM
Reposted by Dani Solà
Wow, that's an insanely cool website: animejs.com/
Anime.js | JavaScript Animation Engine
A fast and versatile JavaScript animation library
animejs.com
April 19, 2025 at 7:15 AM
Reposted by Dani Solà
A new Python edition of "Forecasting: Principles and Practice" is now available online at otexts.com/fpppy/. Thanks to @azulgarza.bsky.social, Cristian Challu, Max Mergenthaler, Kin Olivares & Nixtla for making this happen. #forecasting #python
Forecasting: Principles and Practice, the Pythonic Way
otexts.com
April 11, 2025 at 12:25 AM
Reposted by Dani Solà
Interesting read on ClickHouse’s query condition cache (not a query result cache) — efficient indices built on the fly to reduce unnecessary full table scans for repeated queries.

clickhouse.com/blog/introdu...
Introducing the query condition cache
Repeated queries are everywhere—in dashboards, alerts, observability, and more. Learn how ClickHouse now skips redundant work by caching filter results per granule.
clickhouse.com
April 4, 2025 at 2:33 AM
Reposted by Dani Solà
So it looks like there's a third scaling law: you can make models better by (1) training them with more compute, by (2) having them "think" for longer about an answer, or by (now 3) generating large numbers of answers in parallel & picking good ones

Both 2 & 3 seem to have lots of low-hanging fruit
March 18, 2025 at 3:53 AM
Reposted by Dani Solà
Operationalizing Machine Learning: An Interview Study by @joehellerstein.bsky.social, @adityagp.bsky.social, et al. Particularly love the part on "Retrofitting Explanations".
#MachineLearning #MLOps #Datascience.
arxiv.org/pdf/2209.09125
February 6, 2025 at 7:39 PM
Reposted by Dani Solà
This is a pretty cool resource for applied ML: a list of "case studies" sourced from different companies describing problems they face and the methods they've tried to solve them.

Anyone know of something like this specific to geospatial/remote sensing data problems? #MLsky #CCAI #GISchat
Evidently AI - ML and LLM system design: 500 case studies
How do top companies apply AI? A database of 500 case studies from 100+ companies with practical ML use cases, LLM applications, and learnings from designing ML and LLM systems.
www.evidentlyai.com
March 8, 2025 at 8:00 PM
Reposted by Dani Solà
We are continuing with our series of posts on some non-trivial use cases for XGBoost. In this latest posts we talk about using Shapley *interaction* values for feature engineering.

1/2
February 24, 2025 at 1:53 PM
Just published a post about building smart services at CLARK. A pragmatic approach that worked very well for us, going from heuristics to ML. Thoughts and feedback welcome! #datasky #data #databs medium.com/clark-engine...
A Blueprint for Smart Services
In today’s fast-paced world, creating intelligent services that adapt and improve over time is crucial for business success. This post…
medium.com
February 20, 2025 at 7:38 PM
Reposted by Dani Solà
Despite patriarchy's persistence, growing numbers of men believe they have it worse off than women. And, new research shows this "male victimhood" ideology is most common among men who aren't facing hardship. Which means what they're really feeling is status loss. 1/
www.psypost.org/male-victimh...
Male victimhood ideology driven by perceived status loss, not economic hardship, among Korean men
Research published in Sex Roles suggests that male victimhood ideology among South Korean men is driven more by perceived socioeconomic status decline rather than objective economic hardship.
www.psypost.org
January 19, 2025 at 12:14 PM
Reposted by Dani Solà
DeepSeek-R1!

⚡ Performance on par with OpenAI-o1
📖 Fully open-weight model & technical report
🏆 MIT licensed: Distill & commercialize freely!

🌐 Website & API are live now!
Demo: chat.deepseek.com
Models: huggingface.co/deepseek-ai
January 20, 2025 at 3:12 PM
Reposted by Dani Solà
First post of the year! @andypavlo.bsky.social got me thinking about why Confluent didn't build WarpStream.

My conclusion: legacy infrastructure companies are going to have a tough time against cloud native, AI-enabled, post-ZIRP competitors.
Infrastructure Vendors Are in a Tough Spot
Cloud native, AI-enabled, post-ZIRP companies are the new apex predator.
materializedview.io
January 13, 2025 at 6:45 PM
Reposted by Dani Solà
The MemoryDB paper shows the power of separating responsibilities through clever composition. I think this DB frontend/execution plus a distributed transaction log pattern can be promising for creating serverless variants of many popular databases. E.g., Aurora adopts a similar decoupling approach.
January 5, 2025 at 4:37 PM
Reposted by Dani Solà
OLTP Through the Looking Glass 16 Years Later: Communication is the New Bottleneck
www.cs.cit.tum.de/fi...
January 3, 2025 at 7:53 PM