Diego de las Casas
banner
dlsq.bsky.social
Diego de las Casas
@dlsq.bsky.social
AI Scientist at Mistral AI.
Past: Google DeepMind.
🇧🇷 in 🇬🇧
Reposted by Diego de las Casas
Gotta wait until he double-crosses Indiana Jones to steal the Holy Grail, I'm afraid.
March 9, 2025 at 2:02 PM
Mistral Small 3 is also available on many partner platforms:
- Ollama: ollama.com/library/mist...
- Kaggle: kaggle.com/models/mistr...
- Fireworks: fireworks.ai/models/firew...
- Together: together.ai/blog/mistral...

And many more soon!
mistral-small
Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B.
ollama.com
January 30, 2025 at 9:17 PM
Performance of Mistral Small 3 Instruct model
huggingface.co/mistralai/Mi...
January 30, 2025 at 9:17 PM
Mistral Small 3 Base model
huggingface.co/mistralai/Mi...
January 30, 2025 at 9:17 PM
Mistral Small 3 architecture is optimised for latency while preserving high quality
January 30, 2025 at 9:17 PM
Reposted by Diego de las Casas
I know, but it's just an application of one of my favorite memes:
January 21, 2025 at 7:07 PM
Reposted by Diego de las Casas
In fact, statistical malpractice is the main driver of progress in machine learning. At some point, we need to come to terms with this.
November 22, 2024 at 2:40 PM
Fsdp2 has a different policy for handling streams that is also worth a read
github.com/pytorch/pyto...
[RFC] Per-Parameter-Sharding FSDP · Issue #114299 · pytorch/pytorch
Per-Parameter-Sharding FSDP Motivation As we looked toward next-generation training, we found limitations in our existing FSDP, mainly from the flat parameter construct. To address these, we propos...
github.com
November 23, 2024 at 10:49 AM
Pixtral Large:
- 123B decoder, 1B vision encoder, 128K sequence length
- Frontier multimodal model
- Maintains text performance of Mistral Large 2

HF weights: huggingface.co/mistralai/Pi...
Try it: chat.mistral.ai
Blog post: mistral.ai/news/pixtral...
November 18, 2024 at 5:57 PM