github.com/drbh
ML engineer at HuggingFace 🤗
The Hopper TMA unit is a good example of this, it introduces async GEMM at the hardware level pytorch.org/blog/hopper-...
Along with newer hardware, are new fun kernel algorithms pytorch.org/blog/cutlass...
The Hopper TMA unit is a good example of this, it introduces async GEMM at the hardware level pytorch.org/blog/hopper-...
Along with newer hardware, are new fun kernel algorithms pytorch.org/blog/cutlass...