Lightnews — Scholar-powered news

Bruno.

@bdagnino.com

Building Limai: automated data extraction from unstructured sources | Climate Tech | Ex @PachamaInc | Co-founder @MetricaSports | 🇦🇷🇪🇸

Posts Replies Media Videos

Bruno.

@bdagnino.com

Interesting, will check it out! Thanks for the recommendation.

December 18, 2024 at 5:32 PM

Bruno.

@bdagnino.com

In this post you'll learn how:

1. Build a simple benchmark to evaluate the performance of your models
2. How a single in-context examples allowed 4o-mini to out perform 4o
3. How to simple improve model quality, and latency at the same time.

Check it out!

www.limai.io/blog/example

December 18, 2024 at 11:21 AM

Bruno.

@bdagnino.com

Yes, there are so many things going into the "real eval" that makes it super hard to properly capture.

December 5, 2024 at 2:15 PM

Bruno.

@bdagnino.com

Ohh nice! AlthoughI think that's a bit too much for my skill level 🤣

December 5, 2024 at 2:15 PM

Bruno.

@bdagnino.com

Want to dive into the details?

Check out our full notebook for the code, results, and how we caught hallucinated outputs: github.com/limai-io/de...

Or let’s chat! DM me or email bruno@limai.io to discuss how we can help build robust pipelines for your business. 🚀

demos/vision-extraction-validation.ipynb at main · limai-io/demos

Contribute to limai-io/demos development by creating an account on GitHub.

github.com

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

The Takeaway

Vision-based models are powerful, but validation frameworks are critical for reliable results.

💡 If you’re building data pipelines, combine extraction with validation to ensure accuracy and trust.

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

Key Results

✅ Vision models like Gemini handled layouts flexibly.

✅ Validation caught hallucinations and ensured data accuracy.

✅ Trustworthiness increased for complex documents like utility bills.

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

How It Works

• Extract raw text using a PDF reader.

• Validate each extracted value (e.g., “160.69 €”) by searching for it in the raw text.

• Flag values that don’t match as potential hallucinations.

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

We combined:

1️⃣ Vision-based extraction to handle complex layouts.

2️⃣ Instructor-powered validation to cross-check extracted values against raw text from PDFs.

This ensured data was grounded in reality, not hallucinated.

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

While vision models excel at "reading" layouts, they sometimes invent data.

E.g., instead of extracting "2.983 kW" for contracted power, the model returned "2.0 kW"—a made-up value. 😬

How do we prevent this?

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

Vision-based extraction is becoming the most promising path forward for Document AI.

These models handle complex layouts, tables, and multimodal inputs natively—far beyond what rule-based parsing can achieve. But they also have challenges.

December 5, 2024 at 11:06 AM

Bruno.

@bdagnino.com

That's an interesting question. The dataset I have is not big enough to try that. I suspect that indeed at some point it will start to regress.

December 2, 2024 at 1:26 PM

Bruno.

@bdagnino.com

100%, more so when you have models like Gemini's family in which you can really put A LOT in the context window.

December 2, 2024 at 1:15 PM

Bruno.

@bdagnino.com

If you’re curious about how this approach can work for you, let’s chat!

We’re offering free consulting calls this month to help businesses optimize their AI strategies.

📩 bruno@limai.io or DM me!

December 2, 2024 at 11:47 AM

Bruno.

@bdagnino.com

Check it out here: https://www.limai.io/blog/example

December 2, 2024 at 11:46 AM

Bruno.

@bdagnino.com

In our latests post we break down:
✅ How we built a simple test dataset to evaluate accuracy.
✅ Why adding examples worked so well (and why you should try it).
✅ How this influenced our product's UX/UI strategy.

December 2, 2024 at 11:46 AM

Bruno.

@bdagnino.com

That’s when we tried something so simple it felt obvious in hindsight: we added an example. The results were staggering:
• With a small model plus the example, accuracy leaped from 61% to 97%.
• We achieved this without fine-tuning or complex parsing techniques.

December 2, 2024 at 11:46 AM

Bruno.

@bdagnino.com

Even after a lot of work on prompt engineering and trying out parsing libraries our results were stuck at 61%-80% accuracy—not enough for reliable use.

December 2, 2024 at 11:46 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news