Xuan Son Nguyen
@ngxson.hf.co
Software Engineer @ Hugging Face 🤗
Very nice touch, Gmail 😅
October 5, 2025 at 9:11 PM
Very nice touch, Gmail 😅
Part 2 of my journey building a smart home! 🚀
In this part:
> ESPHome & custom component
> RF433 receiver & transmitter
> Hassio custom addon
In this part:
> ESPHome & custom component
> RF433 receiver & transmitter
> Hassio custom addon
August 29, 2025 at 1:33 PM
Part 2 of my journey building a smart home! 🚀
In this part:
> ESPHome & custom component
> RF433 receiver & transmitter
> Hassio custom addon
In this part:
> ESPHome & custom component
> RF433 receiver & transmitter
> Hassio custom addon
Just published a new article on my blog 🏃♂️
Building My Smart Home - Part 1: Plan, Idea & Home Assistant
Check it out!
Building My Smart Home - Part 1: Plan, Idea & Home Assistant
Check it out!
August 27, 2025 at 7:09 PM
Just published a new article on my blog 🏃♂️
Building My Smart Home - Part 1: Plan, Idea & Home Assistant
Check it out!
Building My Smart Home - Part 1: Plan, Idea & Home Assistant
Check it out!
Kudos to Google and the llama.cpp team! 🤝
GGUF support for Gemma 270M right from day-0
GGUF support for Gemma 270M right from day-0
August 14, 2025 at 4:49 PM
Kudos to Google and the llama.cpp team! 🤝
GGUF support for Gemma 270M right from day-0
GGUF support for Gemma 270M right from day-0
Richy Mini and SmolLM3 are featured in Github's weekly news! 🚀 🚀
July 21, 2025 at 3:53 PM
Richy Mini and SmolLM3 are featured in Github's weekly news! 🚀 🚀
Gemma 3n has arrived in llama.cpp 👨🍳 🍰
Comes in 2 flavors: E2B and E4B (E means "effective/active parameters")
Comes in 2 flavors: E2B and E4B (E means "effective/active parameters")
June 26, 2025 at 6:46 PM
Gemma 3n has arrived in llama.cpp 👨🍳 🍰
Comes in 2 flavors: E2B and E4B (E means "effective/active parameters")
Comes in 2 flavors: E2B and E4B (E means "effective/active parameters")
See you this Sunday at AI Plumbers conference: 2nd edition!
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
👉 Register here: lu.ma/vqx423ct
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
👉 Register here: lu.ma/vqx423ct
June 11, 2025 at 9:09 AM
See you this Sunday at AI Plumbers conference: 2nd edition!
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
👉 Register here: lu.ma/vqx423ct
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
👉 Register here: lu.ma/vqx423ct
✨✨ AIFoundry is bringing you the AI Plumbers Conference: 2nd edition — an open source meetup for low-level AI builders to dive deep into "the plumbing" of modern AI
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
📅 When: June 15, 2025
👉 Register now: lu.ma/vqx423ct
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
📅 When: June 15, 2025
👉 Register now: lu.ma/vqx423ct
June 3, 2025 at 12:19 PM
✨✨ AIFoundry is bringing you the AI Plumbers Conference: 2nd edition — an open source meetup for low-level AI builders to dive deep into "the plumbing" of modern AI
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
📅 When: June 15, 2025
👉 Register now: lu.ma/vqx423ct
📍 Where: GLS Event Campus Berlin, Kastanienallee 82 | 10435 Berlin
📅 When: June 15, 2025
👉 Register now: lu.ma/vqx423ct
Hugging Face Inference Endpoints now officially support deploying **vision** models via llama.cpp 👀 👀
Try it now: endpoints.huggingface.co/catalog
Try it now: endpoints.huggingface.co/catalog
May 15, 2025 at 2:43 PM
Hugging Face Inference Endpoints now officially support deploying **vision** models via llama.cpp 👀 👀
Try it now: endpoints.huggingface.co/catalog
Try it now: endpoints.huggingface.co/catalog
Real-time webcam demo with @huggingface.bsky.social SmolVLM and llama.cpp server.
All running locally on a Macbook M3
All running locally on a Macbook M3
May 12, 2025 at 5:27 PM
Real-time webcam demo with @huggingface.bsky.social SmolVLM and llama.cpp server.
All running locally on a Macbook M3
All running locally on a Macbook M3
Although we have A100, H200, M3 Ultra, etc
Still can't match the power of that Casio FX 😆
Still can't match the power of that Casio FX 😆
April 25, 2025 at 1:01 PM
Although we have A100, H200, M3 Ultra, etc
Still can't match the power of that Casio FX 😆
Still can't match the power of that Casio FX 😆
llama.cpp vision support just got much better! 🚀
Traditionally, models with complicated chat template like MiniCPM-V or Gemma 3 requires a dedicated binary to run.
Now, you can use all supported models via a "llama-mtmd-cli" 🔥
(Only Qwen2VL is not yet supported)
Traditionally, models with complicated chat template like MiniCPM-V or Gemma 3 requires a dedicated binary to run.
Now, you can use all supported models via a "llama-mtmd-cli" 🔥
(Only Qwen2VL is not yet supported)
April 21, 2025 at 1:46 PM
llama.cpp vision support just got much better! 🚀
Traditionally, models with complicated chat template like MiniCPM-V or Gemma 3 requires a dedicated binary to run.
Now, you can use all supported models via a "llama-mtmd-cli" 🔥
(Only Qwen2VL is not yet supported)
Traditionally, models with complicated chat template like MiniCPM-V or Gemma 3 requires a dedicated binary to run.
Now, you can use all supported models via a "llama-mtmd-cli" 🔥
(Only Qwen2VL is not yet supported)
Finally have time to write a blog post about ggml-easy! 😂
ggml-easy is a header-only wrapper for GGML, simplifies development with a cleaner API, easy debugging utilities, and native safetensors loading ✨ Great for rapid prototyping!
ggml-easy is a header-only wrapper for GGML, simplifies development with a cleaner API, easy debugging utilities, and native safetensors loading ✨ Great for rapid prototyping!
April 20, 2025 at 11:27 PM
Finally have time to write a blog post about ggml-easy! 😂
ggml-easy is a header-only wrapper for GGML, simplifies development with a cleaner API, easy debugging utilities, and native safetensors loading ✨ Great for rapid prototyping!
ggml-easy is a header-only wrapper for GGML, simplifies development with a cleaner API, easy debugging utilities, and native safetensors loading ✨ Great for rapid prototyping!
Someone at Google definitely had a lot of fun making this 😆
And if you don't know, it's available in "Starter apps" section on AI Studio. The app is called "Gemini 95"
And if you don't know, it's available in "Starter apps" section on AI Studio. The app is called "Gemini 95"
April 20, 2025 at 10:40 PM
Someone at Google definitely had a lot of fun making this 😆
And if you don't know, it's available in "Starter apps" section on AI Studio. The app is called "Gemini 95"
And if you don't know, it's available in "Starter apps" section on AI Studio. The app is called "Gemini 95"
Telling LLM memory requirement WITHOUT a calculator?
Just use your good old human brain 🧠 😎
Check out my 3‑step estimation 🚀
Just use your good old human brain 🧠 😎
Check out my 3‑step estimation 🚀
April 20, 2025 at 11:00 AM
Telling LLM memory requirement WITHOUT a calculator?
Just use your good old human brain 🧠 😎
Check out my 3‑step estimation 🚀
Just use your good old human brain 🧠 😎
Check out my 3‑step estimation 🚀
Google having a quite good sense of humor 😂
Joke aside, 1B model quantized to Q4 without performance degrading is sweet 🤏
Joke aside, 1B model quantized to Q4 without performance degrading is sweet 🤏
April 19, 2025 at 5:00 PM
Google having a quite good sense of humor 😂
Joke aside, 1B model quantized to Q4 without performance degrading is sweet 🤏
Joke aside, 1B model quantized to Q4 without performance degrading is sweet 🤏
Cooking a fun thing today, I can now load safetensors file directly to GGML without having to convert it to GGUF!
Why? Because this allow me to do experiments faster, especially with models outside of llama.cpp 😆
Why? Because this allow me to do experiments faster, especially with models outside of llama.cpp 😆
March 31, 2025 at 3:25 PM
Cooking a fun thing today, I can now load safetensors file directly to GGML without having to convert it to GGUF!
Why? Because this allow me to do experiments faster, especially with models outside of llama.cpp 😆
Why? Because this allow me to do experiments faster, especially with models outside of llama.cpp 😆
March 30, 2025 at 8:01 PM
On Monday, the 24th, I'm proud to give a talk at sota's webinar.
My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, trade-offs, and limitations.
The session will end with an Q&A, where you can ask me anything about this subject.
My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, trade-offs, and limitations.
The session will end with an Q&A, where you can ask me anything about this subject.
March 20, 2025 at 1:36 PM
On Monday, the 24th, I'm proud to give a talk at sota's webinar.
My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, trade-offs, and limitations.
The session will end with an Q&A, where you can ask me anything about this subject.
My main talk will last for an hour to deep dive into the current state of on-device LLMs, exploring their advantages, trade-offs, and limitations.
The session will end with an Q&A, where you can ask me anything about this subject.
Had a fantastic chat today with Georgi Gerganov, the brilliant mind behind ggml, llama.cpp, and whisper.cpp! We discussed about:
🚀 The integration of vision models into llama.cpp
🚀 The challenges of maintaining a smooth UX/DX
🚀 The exciting future of llama.cpp
Big things ahead - stay tuned!
🚀 The integration of vision models into llama.cpp
🚀 The challenges of maintaining a smooth UX/DX
🚀 The exciting future of llama.cpp
Big things ahead - stay tuned!
March 19, 2025 at 2:53 PM
Had a fantastic chat today with Georgi Gerganov, the brilliant mind behind ggml, llama.cpp, and whisper.cpp! We discussed about:
🚀 The integration of vision models into llama.cpp
🚀 The challenges of maintaining a smooth UX/DX
🚀 The exciting future of llama.cpp
Big things ahead - stay tuned!
🚀 The integration of vision models into llama.cpp
🚀 The challenges of maintaining a smooth UX/DX
🚀 The exciting future of llama.cpp
Big things ahead - stay tuned!
OK now you are the best, Gememe 2.0
March 13, 2025 at 11:23 AM
OK now you are the best, Gememe 2.0
Wanna try Gemma 3 vision with llama.cpp?
There is a playground for that! More in 🧵
There is a playground for that! More in 🧵
March 12, 2025 at 10:05 AM
Wanna try Gemma 3 vision with llama.cpp?
There is a playground for that! More in 🧵
There is a playground for that! More in 🧵
Day-zero Gemma 3 support in llama.cpp 🤯
👉 4 model sizes: 1B, 4B, 12B, 27B
👉 Vision capability (except for 1B) with bi-direction attention
👉 Context size: 32k (1B) and 128k (4B, 12B, 27B)
👉 +140 languages support (except for 1B)
👉 Day-zero support on many frameworks 🚀
👉 4 model sizes: 1B, 4B, 12B, 27B
👉 Vision capability (except for 1B) with bi-direction attention
👉 Context size: 32k (1B) and 128k (4B, 12B, 27B)
👉 +140 languages support (except for 1B)
👉 Day-zero support on many frameworks 🚀
March 12, 2025 at 8:31 AM
Day-zero Gemma 3 support in llama.cpp 🤯
👉 4 model sizes: 1B, 4B, 12B, 27B
👉 Vision capability (except for 1B) with bi-direction attention
👉 Context size: 32k (1B) and 128k (4B, 12B, 27B)
👉 +140 languages support (except for 1B)
👉 Day-zero support on many frameworks 🚀
👉 4 model sizes: 1B, 4B, 12B, 27B
👉 Vision capability (except for 1B) with bi-direction attention
👉 Context size: 32k (1B) and 128k (4B, 12B, 27B)
👉 +140 languages support (except for 1B)
👉 Day-zero support on many frameworks 🚀
Aya Vision is now the number one trending OCR model on Hugging Face 🚀
👉 Comes in 2 sizes, 8B and 32B
👉 Supports 32 languages
👉 Day-zero support with HF Transformers
👉 Comes in 2 sizes, 8B and 32B
👉 Supports 32 languages
👉 Day-zero support with HF Transformers
March 10, 2025 at 11:05 AM
Aya Vision is now the number one trending OCR model on Hugging Face 🚀
👉 Comes in 2 sizes, 8B and 32B
👉 Supports 32 languages
👉 Day-zero support with HF Transformers
👉 Comes in 2 sizes, 8B and 32B
👉 Supports 32 languages
👉 Day-zero support with HF Transformers
Did you know? A number of 🤗 Hugging Face's blog posts now feature AI-created podcasts 🎙️
This offers an alternative way to absorb extensive and intricate articles 🔍
This offers an alternative way to absorb extensive and intricate articles 🔍
March 8, 2025 at 6:00 PM
Did you know? A number of 🤗 Hugging Face's blog posts now feature AI-created podcasts 🎙️
This offers an alternative way to absorb extensive and intricate articles 🔍
This offers an alternative way to absorb extensive and intricate articles 🔍