ZanSara
banner
zansara.bsky.social
ZanSara
@zansara.bsky.social
✨ GenAI expert | 🐍 Python coder | 🪐 Sci-fi reader | 🇭🇺 Studying weird languages | zansara.dev
We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. Then why is it still used in so many modern search pipelines?

In this post we'll see what hybrid retrieval is and how to implement it.

www.zansara.dev/posts/2025-1...

#AI #GenAI #LLMs #BM25 #RAG
What's hybrid retrieval good for?
We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. But they still have a role in modern search pipelines.
www.zansara.dev
November 4, 2025 at 4:21 PM
KV caching is a necessity on modern #LLMs, but it's not easy do to right. In this post I go through a recent survey that categorizes the most important KV caching techniques. Brace yourself for a deep dive!

www.zansara.dev/posts/2025-1...

#AI #GenAI #LLM #KVcaching #vllm
Making sense of KV Cache optimizations, Ep. 1: An overview
Let's make sense of the zoo of techniques that exist out there.
www.zansara.dev
October 29, 2025 at 12:23 PM
Do you know how exactly prompt caching works in #GPT models? What is cached, at which stage? Let's have a deep dive into KV caching and how it makes your #LLM inference speed constant regardless of the prompt size.

www.zansara.dev/posts/2025-1...

#AI #GenAI #kvcaching
How does prompt caching work?
Nearly all inference libraries can do it for you. But what's really going on under the hood?
www.zansara.dev
October 23, 2025 at 3:45 PM
For today's post about common #GenAI questions, let's talk about prompt caching.

Caching sounds like a good idea when you hit speed and cost issues at scale, but you should be careful about what you cache to make it pay off for its added complexity.

www.zansara.dev/posts/2025-1...

#AI #LLMs
What is prompt caching?
Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?
www.zansara.dev
October 17, 2025 at 1:54 PM
I'm starting a series of small blog posts addressing some common doubts about practical details of #GenAI tech like #RAG, agents, #LLM inference or training, etc.

Here is the first one on rerankers: www.zansara.dev/posts/2025-1...

Do you use them in your RAG pipelines?

#AI #LLMs #rerankers
Why using a reranker?
And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.
www.zansara.dev
October 13, 2025 at 3:07 PM
I've seen several approaches to fix the "tools overload" issue that plagues most MCP-heavy apps, but this one is the most interesting so far.

blog.cloudflare.com/code-mode/

#GenAI #AI #MCP
Code Mode: the better way to use MCP
It turns out we've all been using MCP wrong. Most agents today use MCP by exposing the
blog.cloudflare.com
September 30, 2025 at 10:40 AM
Reposted by ZanSara
📦 deepset-ai / haystack
⭐ 22,263 (+30)
🗒 Python

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's be...
GitHub - deepset-ai/haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data...
github.com
September 14, 2025 at 12:02 PM
How can we trust LLMs to handle user's credentials when they can't be made to hide the identity of their character in a Guess Who game? And if you think that affects only small models, think again - flagship proprietary model have the same issues as small OSS ones.

www.zansara.dev/posts/2025-0...
Trying to play "Guess Who" with an LLM
I expected a different kind of fun.
www.zansara.dev
September 15, 2025 at 3:52 PM
LLMs are fantastic personal assistants... and terrible tabletop games players. ♟️

Do you want to challenge GPT-5 or Claude Opus 4.1 at a round of Guess Who? Give it a try and share your most unexpected gameplays! 🎲

👉 www.zansara.dev/guess-who/

#LLM #GenAI #GPT #GPT5 #AI
Play 'Guess Who' with LLMs!
Play 'Guess Who' against your favorite LLMs
www.zansara.dev
September 6, 2025 at 1:01 AM
Reposted by ZanSara
I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and interesting details from the GPT-5 system card simonwillison.net/2025/Aug/7/g...
GPT-5: Key characteristics, pricing and model card
I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still …
simonwillison.net
August 7, 2025 at 5:44 PM
🗣️ Learning uncommon languages in the age of #AI has become so much more enjoyable! Check out #Speechify: just take a picture of a page, and it will read it out loud like your teacher would 📖

👉 Try it here: speechify.com/text-to-spee...

#TTS #LanguageLearning #TextToSpeech #OCR
June 14, 2025 at 6:55 PM
✋Have you ever tried to interrupt a Voice AI mid-sentence? Probably yes.

💭 But the LLM did not perceive the interruption the same way you did.

👤 Let's see what Claude does when we interrupt while it counts...

#GenAI #Ai #Claude4 #VoiceAI
June 2, 2025 at 5:18 PM
🧠 Reasoning #LLMs may overthink or jump to conclusions when the reasoning effort is set to the wrong value.
✨ AutoThink runs the query through a classifier and decides how much effort the query needs.
❓ Have you tried it?
papers.ssrn.com/sol3/papers...
#GenAI #AI
May 28, 2025 at 9:43 AM
Reposted by ZanSara
🚀 Skyrocketing! 🚀 (200+ new stars)

📦 anthropics / claude-code
⭐ 9,088 (+205)
🗒 Shell

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows...
GitHub - anthropics/claude-code: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...
github.com
May 23, 2025 at 6:02 PM
📢 Don't overlook this in the wave of releases! #MistralAI has a new coding LLM: it's #Devstral, an open model perfect for on-prem, private and local deployments 🐈

📰 Have a look at the announcement: mistral.ai/news/devstral

#MistralAI #GenAI #LLMs #SWEBench
May 23, 2025 at 3:01 PM
Vibecoding with Claude 4 🎶 [Original video at this link: www.zansara.dev/posts/2025-0... ] #vibecoding #AI #GenAI #Claude4 #LLMs #Coding #AgenticAI #VSCode #AnthropicAI
May 22, 2025 at 9:50 PM
🧠 Another flagship model released! @anthropic.com just unveiled Claude Opus 4 and Claude Sonnet 4, and they are at the top of the leaderboard for coding 💻

📰 Check out the announcement: www.anthropic.com/news/claude-4

#GenAI #LLMs #Claude #Claude4 #SweBench
May 22, 2025 at 4:48 PM
🐜 Small models are making giant leaps! #Google just released Gemma 3n, a mobile-first #multimodal LLM that can understand text, images, audio and even video input while running on your phone 📱

📰 Read the announcement here: developers.googleblog.com/en/introduc...

#GenAI #LLMs #Gemma #SLM
Google for Developers Blog - News about Web, Mobile, AI and Cloud
developers.googleblog.com
May 22, 2025 at 9:05 AM
Do you know that GenAI can help you finish that side project that has been gathering dust for months, waiting for its time to shine? ✨

In my last blog post I vibecode a small subtitle generator with o4-mini-high and Claude 3.7 Sonnet 🎬

www.zansara.dev/posts/2025-...

#GenAI #LLMs
A simple vibecoding exercise
Sometimes, after an entire day of coding, the last thing you want to do is to code some more. It would be so great if I could just sit down and enjoy some Youtube videos… Being abroad, most of the videos I watch are in a foreign language, and it helps immensely to have subtitles when I’m not in the mood for hard focus. However, Youtube subtitles are often terrible or missing entirely.
www.zansara.dev
May 21, 2025 at 4:01 PM
⚠️ Attention! If you or your company:

- 🇪🇺 are based in the EU
- 🦙 you’re thinking of integrating Llama models into your product

📜 Pay close attention to its license: you may be breaking Meta’s terms!

www.zansara.dev/posts/2025-0...

#GenAI #Llama #Multimodal #LLM #AI #AIAct
Using Llama Models in the EU
The Llama 4 family has been released over a month ago and I finally found some time to explore it. Or so I wished to do, until I realized one crucial issue with these models: They are banned in the EU...
www.zansara.dev
May 16, 2025 at 3:26 PM
Wanna learn more about reasoning LLMs? Check out this short blog post where we debunk three common misunderstanding about these models, and join me at ODSC East 2025 for a complete webinar on the topic!

www.zansara.dev/posts/2025-0...

#AI #GenAI #LLMs #ODSCEast #webinar
Beyond the hype of reasoning models: debunking three common misunderstandings
With the release of OpenAI’s o1 and similar models such as DeepSeek R1, Gemini 2.0 Flash Thinking, Phi 4 Reasoning and more, a new type of LLMs entered the scene: the so-called reasoning models. With ...
www.zansara.dev
May 15, 2025 at 5:17 PM
😵‍💫 Piling up instructions in the system prompt of your #LLM doesn't scale!

📢 Intentional makes #GenAI #chatbots able to handle an endless amount of tasks while keeping them under control at all times. Leave it a star on GitHub and try out the demo!

github.com/intentional-...
GitHub - intentional-ai/intentional: Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect.
Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect. - intentional-ai/intentional
github.com
December 21, 2024 at 4:11 PM