Lightnews — Scholar-powered news

Sebastian Sigl

@sesigl.bsky.social

Thanks for sharing. Prompt management is a must have if automate more things, from personal workflows towards product we build.

Definitely give Langfuse a try, it is a charm to setup and use.

November 12, 2025 at 6:27 AM

Sebastian Sigl

@sesigl.bsky.social

In case you enjoyed this thread, please give it a like and share it with your followers.

In case you want to benefit from even more content, please subscribe to my newsletter:

www.sebastiansigl.com/subscribe

Subscribe | Sebastian Sigl

Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

www.sebastiansigl.com

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

These principles are not new. But their application in the messy reality of production search was a powerful lesson.

I share the full story here:

www.sebastiansigl.com/blog/rebuild...

After a Year Rebuilding Search, I Had to Rethink Everything | Sebastian Sigl

A seasoned engineer's lessons from a year rebuilding a search system from the ground up, shifting from engineering-first to product-first thinking.

www.sebastiansigl.com

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

❌ Common Pitfall: Equating technical excellence with product success.

✅ The Principle: A product mindset is the true compass.

The goal is not the most sophisticated system; it's the most effective system for the user.

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

❌ Common Pitfall: Rigid functional roles and hand-offs.

✅ The Principle: Blurring lines creates synergy.

Empower your team. Our progress exploded when data scientists could run A/B tests & engineers could explore data.

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

❌ Common Pitfall: Chasing offline metrics (nDCG, precision).

✅ The Principle: Business impact is the north star.

If an experiment doesn't move a core KPI (engagement, retention), it's not an improvement.

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

❌ Common Pitfall: Engineering for "correctness" from day one.

✅ The Principle: Velocity unlocks correctness.

A fast, end-to-end feedback loop (from user action to A/B test) is the only path to finding what "correct" actually is.

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

❌ Common Pitfall: Treating search as just an algo/infra problem.

✅ The Principle: It's a Data & Product problem first.

An architecture that learns fast from user signals beats one that just serves fast.

October 26, 2025 at 8:00 AM

Sebastian Sigl

@sesigl.bsky.social

Indeed, the prompt matters a lot, especially if you prefer a cost efficient model to make it feasible to run on scale.

September 23, 2025 at 5:23 AM

Sebastian Sigl

@sesigl.bsky.social

And if you find this useful, subscribe to my newsletter for more deep dives like this every week:

www.sebastiansigl.com/subscribe

Subscribe | Sebastian Sigl

Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

www.sebastiansigl.com

September 22, 2025 at 1:43 PM

Sebastian Sigl

@sesigl.bsky.social

Read the full, in-depth playbook on my blog. No fluff, just actionable advice.

www.sebastiansigl.com/blog/llm-jud...

The 5 Biases That Can Silently Kill Your LLM Evaluations (And How to Fix Them) | Sebastian Sigl

Your LLM-as-a-Judge system might be lying to you. This post uncovers 5 critical biases like positional, verbosity, and moderation bias that silently corrupt your AI evaluations, leading to poor produc...

www.sebastiansigl.com

September 22, 2025 at 1:43 PM

Sebastian Sigl

@sesigl.bsky.social

Relying on a biased judge is like flying a plane with a faulty altimeter. You think you're climbing, but you're headed for the ground.

I’ve written a complete guide on how to diagnose and fix these issues, plus build a resilient evaluation system.

September 22, 2025 at 1:43 PM

Sebastian Sigl

@sesigl.bsky.social

4 & 5/ Authority & Moderation Bias

The Judge is easily fooled.

It falls for fake citations ("Harvard study...") and rewards "safe" refusals that users hate. This erodes trust and makes your product useless.

Fix: Use reference-guided evaluation and mandatory human review for refusal cases.

September 22, 2025 at 1:43 PM

Sebastian Sigl

@sesigl.bsky.social

3/ Self-Enhancement Bias (aka Nepotism)

The Judge prefers answers from its own model family (e.g., GPT-4 judging GPT-4).

This makes objective cross-model benchmarking impossible.

Fix: Use a neutral, third-party judge model (e.g., use a Google model to judge OpenAI vs. Anthropic).

September 22, 2025 at 1:43 PM

Sebastian Sigl

@sesigl.bsky.social

2/ Verbosity Bias

The Judge thinks longer = better.

It will reward a 5-paragraph answer over a correct 2-sentence one. This trains your models to be annoying and unhelpful.

Fix: Add "Be concise" and "Penalize verbosity" directly into your judge's rubric.

September 22, 2025 at 1:43 PM

Sebastian Sigl

@sesigl.bsky.social

1/ Positional Bias

The Judge has a favorite: the first option it sees.

If you A/B test prompts and always put A first, you're not measuring quality—you're measuring position.

Fix: Swap the order and run the test again. If the judgment flips, it's invalid. Simple & powerful.

September 22, 2025 at 1:43 PM

Reposted by Sebastian Sigl

Sebastian Sigl

@sesigl.bsky.social

I've written a full guide with code examples and the 4 core principles for writing AI-ready Python tests.
It's the playbook for harnessing AI speed without sacrificing quality.

Read it here: www.sebastiansigl.com/blog/type-sa...

#Python #Testing #AI #SoftwareQuality

Augmented Coding, Amplified Risk: Why Type-Safe Python Tests Matter More Than Ever | Sebastian Sigl

AI coding assistants are accelerating development—but also magnifying quality risks. Here’s how to write Python tests that survive refactors, scale with your codebase, and tame the chaos of augmented ...

www.sebastiansigl.com

August 15, 2025 at 7:37 AM

Sebastian Sigl

@sesigl.bsky.social

If you enjoyed this thread and want more high-quality content, subscribe to my newsletter here:

www.sebastiansigl.com/subscribe

Subscribe | Sebastian Sigl

Subscribe to Sebastian Sigl's newsletter and benefit from big tech insights, actionable advice, and an independent viewpoint.

www.sebastiansigl.com

August 15, 2025 at 7:37 AM

Sebastian Sigl

@sesigl.bsky.social

I've written a full guide with code examples and the 4 core principles for writing AI-ready Python tests.
It's the playbook for harnessing AI speed without sacrificing quality.

Read it here: www.sebastiansigl.com/blog/type-sa...

#Python #Testing #AI #SoftwareQuality

Augmented Coding, Amplified Risk: Why Type-Safe Python Tests Matter More Than Ever | Sebastian Sigl

AI coding assistants are accelerating development—but also magnifying quality risks. Here’s how to write Python tests that survive refactors, scale with your codebase, and tame the chaos of augmented ...

www.sebastiansigl.com

August 15, 2025 at 7:37 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news