Lightnews — Scholar-powered news

ZanSara

@zansara.bsky.social

We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. Then why is it still used in so many modern search pipelines?

In this post we'll see what hybrid retrieval is and how to implement it.

www.zansara.dev/posts/2025-1...

#AI #GenAI #LLMs #BM25 #RAG

What's hybrid retrieval good for?

We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. But they still have a role in modern search pipelines.

www.zansara.dev

November 4, 2025 at 4:21 PM

ZanSara

@zansara.bsky.social

KV caching is a necessity on modern #LLMs, but it's not easy do to right. In this post I go through a recent survey that categorizes the most important KV caching techniques. Brace yourself for a deep dive!

www.zansara.dev/posts/2025-1...

#AI #GenAI #LLM #KVcaching #vllm

Making sense of KV Cache optimizations, Ep. 1: An overview

Let's make sense of the zoo of techniques that exist out there.

www.zansara.dev

October 29, 2025 at 12:23 PM

ZanSara

@zansara.bsky.social

Do you know how exactly prompt caching works in #GPT models? What is cached, at which stage? Let's have a deep dive into KV caching and how it makes your #LLM inference speed constant regardless of the prompt size.

www.zansara.dev/posts/2025-1...

#AI #GenAI #kvcaching

How does prompt caching work?

Nearly all inference libraries can do it for you. But what's really going on under the hood?

www.zansara.dev

October 23, 2025 at 3:45 PM

ZanSara

@zansara.bsky.social

For today's post about common #GenAI questions, let's talk about prompt caching.

Caching sounds like a good idea when you hit speed and cost issues at scale, but you should be careful about what you cache to make it pay off for its added complexity.

www.zansara.dev/posts/2025-1...

#AI #LLMs

What is prompt caching?

Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?

www.zansara.dev

October 17, 2025 at 1:54 PM

ZanSara

@zansara.bsky.social

I'm starting a series of small blog posts addressing some common doubts about practical details of #GenAI tech like #RAG, agents, #LLM inference or training, etc.

Here is the first one on rerankers: www.zansara.dev/posts/2025-1...

Do you use them in your RAG pipelines?

#AI #LLMs #rerankers

Why using a reranker?

And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.

www.zansara.dev

October 13, 2025 at 3:07 PM

ZanSara

@zansara.bsky.social

I've seen several approaches to fix the "tools overload" issue that plagues most MCP-heavy apps, but this one is the most interesting so far.

blog.cloudflare.com/code-mode/

#GenAI #AI #MCP

Code Mode: the better way to use MCP

It turns out we've all been using MCP wrong. Most agents today use MCP by exposing the

blog.cloudflare.com

September 30, 2025 at 10:40 AM

Reposted by ZanSara

GitHub Trending 🤖

@github-trending.bsky.social

📦 deepset-ai / haystack
⭐ 22,263 (+30)
🗒 Python

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's be...

GitHub - deepset-ai/haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data...

github.com

September 14, 2025 at 12:02 PM

ZanSara

@zansara.bsky.social

How can we trust LLMs to handle user's credentials when they can't be made to hide the identity of their character in a Guess Who game? And if you think that affects only small models, think again - flagship proprietary model have the same issues as small OSS ones.

www.zansara.dev/posts/2025-0...

Trying to play "Guess Who" with an LLM

I expected a different kind of fun.

www.zansara.dev

September 15, 2025 at 3:52 PM

ZanSara

@zansara.bsky.social

LLMs are fantastic personal assistants... and terrible tabletop games players. ♟️

Do you want to challenge GPT-5 or Claude Opus 4.1 at a round of Guess Who? Give it a try and share your most unexpected gameplays! 🎲

👉 www.zansara.dev/guess-who/

#LLM #GenAI #GPT #GPT5 #AI

Play 'Guess Who' with LLMs!

Play 'Guess Who' against your favorite LLMs

www.zansara.dev

September 6, 2025 at 1:01 AM

Reposted by ZanSara

Simon Willison

@simonwillison.net

I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and interesting details from the GPT-5 system card simonwillison.net/2025/Aug/7/g...

GPT-5: Key characteristics, pricing and model card

I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still …

simonwillison.net

August 7, 2025 at 5:44 PM

ZanSara

@zansara.bsky.social

🗣️ Learning uncommon languages in the age of #AI has become so much more enjoyable! Check out #Speechify: just take a picture of a page, and it will read it out loud like your teacher would 📖

👉 Try it here: speechify.com/text-to-spee...

#TTS #LanguageLearning #TextToSpeech #OCR

June 14, 2025 at 6:55 PM

ZanSara

@zansara.bsky.social

✋Have you ever tried to interrupt a Voice AI mid-sentence? Probably yes.

💭 But the LLM did not perceive the interruption the same way you did.

👤 Let's see what Claude does when we interrupt while it counts...

#GenAI #Ai #Claude4 #VoiceAI

June 2, 2025 at 5:18 PM

ZanSara

@zansara.bsky.social

🧠 Reasoning #LLMs may overthink or jump to conclusions when the reasoning effort is set to the wrong value.
✨ AutoThink runs the query through a classifier and decides how much effort the query needs.
❓ Have you tried it?
papers.ssrn.com/sol3/papers...
#GenAI #AI

May 28, 2025 at 9:43 AM

Reposted by ZanSara

GitHub Trending 🤖

@github-trending.bsky.social

🚀 Skyrocketing！ 🚀 (200+ new stars)

📦 anthropics / claude-code
⭐ 9,088 (+205)
🗒 Shell

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows...

GitHub - anthropics/claude-code: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...

github.com

May 23, 2025 at 6:02 PM

ZanSara

@zansara.bsky.social

📢 Don't overlook this in the wave of releases! #MistralAI has a new coding LLM: it's #Devstral, an open model perfect for on-prem, private and local deployments 🐈

📰 Have a look at the announcement: mistral.ai/news/devstral

#MistralAI #GenAI #LLMs #SWEBench

May 23, 2025 at 3:01 PM

ZanSara

@zansara.bsky.social

Vibecoding with Claude 4 🎶 [Original video at this link: www.zansara.dev/posts/2025-0... ] #vibecoding #AI #GenAI #Claude4 #LLMs #Coding #AgenticAI #VSCode #AnthropicAI

May 22, 2025 at 9:50 PM

ZanSara

@zansara.bsky.social

🧠 Another flagship model released! @anthropic.com just unveiled Claude Opus 4 and Claude Sonnet 4, and they are at the top of the leaderboard for coding 💻

📰 Check out the announcement: www.anthropic.com/news/claude-4

#GenAI #LLMs #Claude #Claude4 #SweBench

May 22, 2025 at 4:48 PM

ZanSara

@zansara.bsky.social

🐜 Small models are making giant leaps! #Google just released Gemma 3n, a mobile-first #multimodal LLM that can understand text, images, audio and even video input while running on your phone 📱

📰 Read the announcement here: developers.googleblog.com/en/introduc...

#GenAI #LLMs #Gemma #SLM

Google for Developers Blog - News about Web, Mobile, AI and Cloud

developers.googleblog.com

May 22, 2025 at 9:05 AM

ZanSara

@zansara.bsky.social

Do you know that GenAI can help you finish that side project that has been gathering dust for months, waiting for its time to shine? ✨

In my last blog post I vibecode a small subtitle generator with o4-mini-high and Claude 3.7 Sonnet 🎬

www.zansara.dev/posts/2025-...

#GenAI #LLMs

A simple vibecoding exercise

Sometimes, after an entire day of coding, the last thing you want to do is to code some more. It would be so great if I could just sit down and enjoy some Youtube videos… Being abroad, most of the videos I watch are in a foreign language, and it helps immensely to have subtitles when I’m not in the mood for hard focus. However, Youtube subtitles are often terrible or missing entirely.

www.zansara.dev

May 21, 2025 at 4:01 PM

ZanSara

@zansara.bsky.social

⚠️ Attention! If you or your company:

- 🇪🇺 are based in the EU
- 🦙 you’re thinking of integrating Llama models into your product

📜 Pay close attention to its license: you may be breaking Meta’s terms!

www.zansara.dev/posts/2025-0...

#GenAI #Llama #Multimodal #LLM #AI #AIAct

Using Llama Models in the EU

The Llama 4 family has been released over a month ago and I finally found some time to explore it. Or so I wished to do, until I realized one crucial issue with these models: They are banned in the EU...

www.zansara.dev

May 16, 2025 at 3:26 PM

ZanSara

@zansara.bsky.social

Wanna learn more about reasoning LLMs? Check out this short blog post where we debunk three common misunderstanding about these models, and join me at ODSC East 2025 for a complete webinar on the topic!

www.zansara.dev/posts/2025-0...

#AI #GenAI #LLMs #ODSCEast #webinar

Beyond the hype of reasoning models: debunking three common misunderstandings

With the release of OpenAI’s o1 and similar models such as DeepSeek R1, Gemini 2.0 Flash Thinking, Phi 4 Reasoning and more, a new type of LLMs entered the scene: the so-called reasoning models. With ...

www.zansara.dev

May 15, 2025 at 5:17 PM

ZanSara

@zansara.bsky.social

😵‍💫 Piling up instructions in the system prompt of your #LLM doesn't scale!

📢 Intentional makes #GenAI #chatbots able to handle an endless amount of tasks while keeping them under control at all times. Leave it a star on GitHub and try out the demo!

github.com/intentional-...

GitHub - intentional-ai/intentional: Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect.

Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect. - intentional-ai/intentional

github.com

December 21, 2024 at 4:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news