mchra0.bsky.social
@mchra0.bsky.social
Reposted
My new field guide to alternatives to standard LLMs:

Gated DeltaNet hybrids (Qwen3-Next, Kimi Linear), text diffusion, code world models, and small reasoning transformers.

🔗 magazine.sebastianraschka.com/p/beyond-sta...
November 4, 2025 at 2:49 PM
Reposted
Just saw the benchmarks of the new open-weight MiniMax-M2 LLM, and the performance is too good to ignore :). So, I just amended my "The Big LLM Architecture Comparison" with entry number 13!

Link to the full article: magazine.sebastianraschka.com/p/the-big-ll...
October 28, 2025 at 4:48 PM
Amazon to layoff 30,000 corporate employees. The announcement comes tomorrow.
This year has been brutal for tech workers:
- September: 19,300 people impacted
- October: 5,100 more
- Tomorrow: 30,000 from Amazon alone.

#amazon #layoffs #jobmarket
October 27, 2025 at 11:05 PM
One small pebble holding up mountains 🏔️

#aws #outage #dns #amazon #down
October 22, 2025 at 6:02 PM
Reposted
🔗 Mixture of Experts (MoE): github.com/rasbt/LLMs-f...
October 20, 2025 at 1:48 PM
ELIZA: a chatbot which was created by humans in 1960s. Creator of ELIZA program was Joseph Weizenbaum. ELIZA was supposed to be therapist. Although we do not have computer that old, you can still talk to ELIZA in its web version here: www.masswerk.at/eliza/
E.L.I.Z.A. Talking
E.L.I.Z.A. Talking is a project to explore speech I/O in modern browsers.
www.masswerk.at
October 20, 2025 at 1:53 AM
People aren’t subscribing to an AI.
They’re subscribing to the world’s most advanced 𝗺𝗼𝗱𝗲𝗹 𝗼𝗳 𝗰𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗽𝗿𝗼𝗯𝗮𝗯𝗶𝗹𝗶𝘁𝘆 — one that has learned how words, ideas, and meanings relate to each other in ways that 𝗳𝗲𝗲𝗹 𝗵𝘂𝗺𝗮𝗻.

#ai #probability
October 16, 2025 at 5:55 PM
Reposted
Sliding Window Attention
🔗 github.com/rasbt/LLMs-f...
October 13, 2025 at 1:51 PM
Reposted
Multi-Head Latent Attention
🔗 github.com/rasbt/LLMs-f...
October 12, 2025 at 1:57 PM
Built clone of NotebookLM.
System:
• 𝗙𝗔𝗜𝗦𝗦 for vector search with 𝘀𝗲𝗻𝘁𝗲𝗻𝗰𝗲-𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀/𝗮𝗹𝗹-𝗠𝗶𝗻𝗶𝗟𝗠-𝗟𝟲-𝘃𝟮
• 𝗙𝗮𝘀𝘁𝗔𝗣𝗜 backend with async
• 𝗚𝗿𝗼𝗾 𝗔𝗣𝗜 (𝗟𝗹𝗮𝗺𝗮 𝟯.𝟯 𝟳𝟬𝗕) for ultra-fast inference
• 𝗘𝗹𝗲𝘃𝗲𝗻𝗟𝗮𝗯𝘀 𝗧𝗧𝗦 (with 𝗴𝗧𝗧𝗦 fallback) for realistic audio generation

GitHub: github.com/mcrao/Build-...
GitHub - mcrao/Build-Notebook-LM-Clone: A full-stack AI-powered application that replicates Google's NotebookLM functionality. Upload PDF documents, chat with them using advanced RAG (Retrieval-Augmen...
A full-stack AI-powered application that replicates Google's NotebookLM functionality. Upload PDF documents, chat with them using advanced RAG (Retrieval-Augmented Generation), and automaticall...
github.com
October 12, 2025 at 4:53 PM
Reposted
It only took 13 years, but dark mode is finally here
sebastianraschka.com/blog/2021/dl...
October 8, 2025 at 1:50 AM
So when to go for vector DBs like Pinecone, Chroma, @weaviate.bsky.social, @qdrant.bsky.social ?

Here are my thoughts:
In my experiment, I was dealing with embedding vectors less than 100,000+.

#pinecone #chroma #weaviate #qdrant #vectordb
RAG ≠ 30-min tutorial.
To build production-ready RAG you need:
📄 Ingestion (PDF, OCR)
✂️ Chunking (Fixed, Semantic, Recursive, LLM-based)
🔎 Embeddings (Pinecone, Chroma, pgvector)
🧪 Eval (RAGAS)
🌐 Deploy (Supabase + pgvector + Lovable)
Demo 👉 diet-whisper.lovable.app
Code 👉 github.com/mcrao/RAG/tr...
September 29, 2025 at 9:30 PM
RAG ≠ 30-min tutorial.
To build production-ready RAG you need:
📄 Ingestion (PDF, OCR)
✂️ Chunking (Fixed, Semantic, Recursive, LLM-based)
🔎 Embeddings (Pinecone, Chroma, pgvector)
🧪 Eval (RAGAS)
🌐 Deploy (Supabase + pgvector + Lovable)
Demo 👉 diet-whisper.lovable.app
Code 👉 github.com/mcrao/RAG/tr...
September 29, 2025 at 5:03 AM
Reposted
My notes on OpenAI's gpt-5-codex model, now available via their API - I upgraded my llm-openai-plugin to handle it and had GPT-5-Codex itself implement tool support for that plugin simonwillison.net/2025/Sep/23/...
GPT-5-Codex
OpenAI half-relased this model earlier this month, adding it to their Codex CLI tool but not their API. Today they've fixed that - the new model can now be accessed …
simonwillison.net
September 24, 2025 at 12:05 AM
Reposted
I've been nerdsniped by the idea of Semantic IDs.

Here's the result of my training runs:
• RQ-VAE to compress item embeddings into tokens
• SASRec to predict the next item (i.e., 4-tokens) exactly
• Qwen3-8B that can return recs and natural language!

eugeneyan.com/writing/sema...
How to Train an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs
An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.
eugeneyan.com
September 17, 2025 at 2:04 AM
Reposted
Made two pelicans with the new Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking models

🐧🦩 Who needs legs?!

simonwillison.net/2025/Sep/12/...
Qwen3-Next-80B-A3B: 🐧🦩 Who needs legs?!
Qwen announced two new models via their Twitter account (nothing on their blog yet): Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking. They make some big claims on performance: Qwen3-Next-8...
simonwillison.net
September 12, 2025 at 4:14 AM
Reposted
🐋🐦🐸 We just launched an interactive demo of NatureLM-audio on @hf.co!

👉 Try the demo with your audio or ours, share your feedback, and help us shape the future of decoding animal communication: huggingface.co/blog/EarthSp...
September 4, 2025 at 4:17 PM
Reposted
Wrote an intro to evals for long-context Q&A systems:
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs

eugeneyan.com/writing/qa-e...
Evaluating Long-Context Question & Answer Systems
Evaluation metrics, how to build eval datasets, eval methodology, and a review of several benchmarks.
eugeneyan.com
June 25, 2025 at 1:48 AM
Reposted
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling: www.youtube.com/watch?v=Zar2...
March 23, 2025 at 1:38 PM
Reposted
Coded Llama 3.2 model from scratch and shared it on the HF Hub.
Why? Because I think 1B & 3B models are great for experimentation, and I wanted to share a clean, readable implementation for learning and research: huggingface.co/rasbt/llama-...
March 31, 2025 at 5:13 PM
Reposted
Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...
April 19, 2025 at 1:48 PM
Reposted
I just finished writing up my take on reasoning models: magazine.sebastianraschka.com/p/understand...
Here, I
1. Discuss the advantages & disadvantages of reasoning models
2. Of course, describe and discuss DeepSeek R1
3. Describe the 4 main ways to building & improving reasoning models
Understanding Reasoning LLMs
Methods and Strategies for Building and Refining Reasoning Models
magazine.sebastianraschka.com
February 5, 2025 at 1:46 PM