Felipe
ffffelipe.bsky.social
Felipe
@ffffelipe.bsky.social
Pre-training @ Cohere
Fun sonnet 4 hallucination on muP

The Yang-Lecun correspondence
May 30, 2025 at 7:59 AM
Very happy to share the command-A tech report! I believe this the largest published model with muP+fp8 :)

Lots of interesting post-training details as well. And great performance ofc!

arxiv.org/abs/2504.00698
Command A: An Enterprise-Ready Large Language Model
In this report we describe the development of Command A, a powerful large language model purpose-built to excel at real-world enterprise use cases. Command A is an agent-optimised and multilingual-cap...
arxiv.org
April 2, 2025 at 7:45 PM
> spend some time porting critical code from c++ to python

> c++ code is slower than python

> After a while optimizing it, figure out you forgot to add -O3

> Runs much faster obviously

> At the end the python bindings eat up half of the runtime benefits

🥲🎢
November 25, 2024 at 9:32 PM