Lightnews — Scholar-powered news

Reposted by Matei Zaharia

Andrew Drozdov

@mrdrozdov.com

We built a thing! The Databricks Reranker is now in Public Preview. It's as easy as changing the arguments to your vector search call, and doesn't require any additional setup.

Read more: www.databricks.com/blog/reranki...

Reranking in Mosaic AI Vector Search for Faster, Smarter Retrieval in RAG Agents

Boost RAG agent quality with reranking—deliver more relevant answers in less time with a single parameter in Mosaic AI Vector Search.

www.databricks.com

August 19, 2025 at 12:03 AM

Matei Zaharia

@matei-zaharia.bsky.social

Excited to launch Agent Bricks, a new way to build auto-optimized agents on your tasks. Agent Bricks uniquely takes a *declarative* approach to agent development: you tell us what you want, and we auto-generate evals and optimize the agent.

www.databricks.com/blog/introdu...

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

Discover Agent Bricks by Databricks — a new way to build production-ready AI agents using your data. Automatically evaluate, optimize, and scale agents with higher accuracy and lower cost.

www.databricks.com

June 11, 2025 at 5:08 PM

Matei Zaharia

@matei-zaharia.bsky.social

Apache Spark 4.0 is out with some huge improvements across the board. SQL’s much more powerful, Spark Connect makes it easier to run apps, new languages and more. It’s amazing to see the community still growing fast and releasing over 5000 patches in 4.0. www.databricks.com/blog/introdu...

Introducing Apache Spark 4.0

Explore Apache Spark 4.0's key updates: advanced SQL features, improved Python support, enhanced streaming, and productivity boosts for big data analytics.

www.databricks.com

May 29, 2025 at 5:14 PM

Matei Zaharia

@matei-zaharia.bsky.social

#MLSys 2025 is next week! You can still register at mlsys.org.

May 5, 2025 at 4:37 PM

Matei Zaharia

@matei-zaharia.bsky.social

Nice results on never-ending learning for code editing. We believe that a lot of AI applications will be customizable this way (to every company's codebase, users, etc). The combined AI serving, data and MLOps environment on Databricks makes these easy to build.
www.databricks.com/blog/power-f...

The Power of Fine-Tuning on Your Data: Quick Fixing Bugs with LLMs via Never Ending Learning (NEL)

Discover how fine-tuning small open-source LLMs on interaction data enables faster, cheaper, and more accurate code fixes with Databricks Quick Fix.

www.databricks.com

April 9, 2025 at 12:59 AM

Reposted by Matei Zaharia

MLflow

@mlflow.org

🎥 New Video: Get Hands-On with MLflow Tracing!

In this video, @danliden.com walks through how #MLflow Tracing boosts observability in #GenAI apps—great for debugging, experimentation & organizing data workflows.

Watch now ➡️ www.youtube.com/watch?v=iRbB...

#opensource #oss

MLflow Tracing | Introduction & Tutorial

YouTube video by MLflow

www.youtube.com

April 4, 2025 at 1:08 PM

Matei Zaharia

@matei-zaharia.bsky.social

Really cool result from the Databricks research team: You can tune LLMs for a task *without data labels*, using test-time compute and RL, and outperform supervised fine-tuning! Our new TAO method scales with compute to produce fast, high-quality models. www.databricks.com/blog/tao-usi...

TAO: Using test-time compute to train efficient LLMs without labeled data

LIFT fine-tunes LLMs without labels using reinforcement learning, boosting performance on enterprise tasks.

www.databricks.com

March 25, 2025 at 5:47 PM

Matei Zaharia

@matei-zaharia.bsky.social

The #MLSys2025 program is up and registration is open! Check out accepted papers at mlsys.org/virtual/2025... and sign up to attend at mlsys.org/Register.

March 18, 2025 at 5:26 PM

Reposted by Matei Zaharia

MLflow

@mlflow.org

Exciting news—MLflow 2.21.0 is live! 🎉 This release includes significant features, enhancements, and bug fixes to improve documentation, #GenAI prompt management, tracing & more.

🔗 Explore all the new features & improvements: mlflow.org/releases/2.2...

#opensource #oss #mlflow

March 14, 2025 at 5:54 PM

Reposted by Matei Zaharia

Lakshya A Agrawal

@lakshyaaagrawal.bsky.social

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!

We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

March 3, 2025 at 6:59 PM

Reposted by Matei Zaharia

Andrew Drozdov

@mrdrozdov.com

We're probably a little too obsessed with zero-shot retrieval. If you have documents (you do), then you can generate synthetic data, and finetune your embedding. Blog post lead by @jacobianneuro.bsky.social shows how well this works in practice.

www.databricks.com/blog/improvi...

Improving Retrieval and RAG with Embedding Model Finetuning

Fine-tune embedding models on Databricks to enhance retrieval and RAG accuracy with synthetic data—no manual labeling required.

www.databricks.com

February 26, 2025 at 12:48 AM

Reposted by Matei Zaharia

SAP

@sap.com

We're bringing in a new era of enterprise data management and agentic AI with SAP Business Data Cloud with Databricks.

✅ Unifies your SAP and non-SAP data

✅ Natively embeds Databricks technology

✅ AI agents streamline workflows

Learn more: sap.to/sapbdc

February 13, 2025 at 2:33 PM

Matei Zaharia

@matei-zaharia.bsky.social

Sponsor registration is open for #MLSys 2025. We have the most submissions ever to MLSys so it promises to be a great conference! mlsys.org/Sponsors/spo...

2025 Sponsor / Exhibitor Information

mlsys.org

January 20, 2025 at 2:20 AM

Reposted by Matei Zaharia

TechCrunch

@techcrunch.com

Researchers open source Sky-T1, a ‘reasoning’ AI model that can be trained for less than $450

So-called reasoning AI models are becoming easier — and cheaper — to develop. On Friday, NovaSky, a team of researchers based out of UC Berkeley’s Sky Computing Lab, released Sky-T1-32B-Preview, a reasoning model that’s competitive with an earlier version…

tcrn.ch

January 11, 2025 at 9:47 PM

Reposted by Matei Zaharia

Sebastian Raschka (rasbt)

@sebastianraschka.com

"Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks."
That was quick! Is this already the Alpaca moment for reasoning models?
Source: novasky-ai.github.io/posts/sky-t1/

January 14, 2025 at 12:34 AM

Matei Zaharia

@matei-zaharia.bsky.social

Congrats to Meta on releasing Llama 3.3, a 70B model that matches the performance of Llama-405B! Open weight models are advancing so rapidly and the cost to get this performance is quickly going down. We're thrilled to let users serve & customize this on Databricks. huggingface.co/meta-llama/L...