Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!
Try it out: huggingface.co/spaces/data-...
Try it out: huggingface.co/spaces/data-...
The agent can load data, execute code, plot results and following your guidance and ideas!
A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!
The agent can load data, execute code, plot results and following your guidance and ideas!
A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!
Unsurprisingly: data, data, data!
The SmolTalk is open and available here: huggingface.co/datasets/Hug...
Unsurprisingly: data, data, data!
The SmolTalk is open and available here: huggingface.co/datasets/Hug...
Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.
*assuming your home includes an H100 cluster
Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.
*assuming your home includes an H100 cluster