Hugo Larcher
hlarcher.bsky.social
Hugo Larcher
@hlarcher.bsky.social
ML Infra engineer @huggingface. HPC and ML infra.
This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).

We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned 🤩!
January 16, 2025 at 9:39 AM