Bruno. (@bdagnino.com)

Interesting, will check it out! Thanks for the recommendation.

18.12.2024 17:32 👍 0 🔁 0 💬 0 📌 0

What's the best way to monitor LLMs that use the Gemini API?

I used to use langfuse but it doesn't seem work as nicely as it does with openai.

18.12.2024 16:38 👍 0 🔁 0 💬 1 📌 0

In this post you'll learn how:

1. Build a simple benchmark to evaluate the performance of your models
2. How a single in-context examples allowed 4o-mini to out perform 4o
3. How to simple improve model quality, and latency at the same time.

Check it out!

www.limai.io/blog/example

18.12.2024 11:21 👍 0 🔁 0 💬 0 📌 0

Using Few Shot examples to boost LLM data extraction by over 50%?

If you spent countless hours fine-tuning prompts, testing different parsing libraries, and trying to craft perfect solutions only to get mediocre results, this is for you.

18.12.2024 11:21 👍 0 🔁 0 💬 1 📌 0

My 3 mantras to stay sane as an entrepreneur.

Always visible in my desk.

I should probably have a nicer version framed or something, but hey, who has time for that? 😂

11.12.2024 08:26 👍 1 🔁 0 💬 0 📌 0

Yes, there are so many things going into the "real eval" that makes it super hard to properly capture.

05.12.2024 14:15 👍 0 🔁 0 💬 1 📌 0

Ohh nice! AlthoughI think that's a bit too much for my skill level 🤣

05.12.2024 14:15 👍 0 🔁 0 💬 0 📌 0

demos/vision-extraction-validation.ipynb at main · limai-io/demos Contribute to limai-io/demos development by creating an account on GitHub.

Want to dive into the details?

Check out our full notebook for the code, results, and how we caught hallucinated outputs: github.com/limai-io/de...

Or let’s chat! DM me or email bruno@limai.io to discuss how we can help build robust pipelines for your business. 🚀

05.12.2024 11:06 👍 0 🔁 0 💬 0 📌 0

The Takeaway

Vision-based models are powerful, but validation frameworks are critical for reliable results.

💡 If you’re building data pipelines, combine extraction with validation to ensure accuracy and trust.

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

Key Results

✅ Vision models like Gemini handled layouts flexibly.

✅ Validation caught hallucinations and ensured data accuracy.

✅ Trustworthiness increased for complex documents like utility bills.

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

How It Works

• Extract raw text using a PDF reader.

• Validate each extracted value (e.g., “160.69 €”) by searching for it in the raw text.

• Flag values that don’t match as potential hallucinations.

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

We combined:

1️⃣ Vision-based extraction to handle complex layouts.

2️⃣ Instructor-powered validation to cross-check extracted values against raw text from PDFs.

This ensured data was grounded in reality, not hallucinated.

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

While vision models excel at "reading" layouts, they sometimes invent data.

E.g., instead of extracting "2.983 kW" for contracted power, the model returned "2.0 kW"—a made-up value. 😬

How do we prevent this?

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

Vision-based extraction is becoming the most promising path forward for Document AI.

These models handle complex layouts, tables, and multimodal inputs natively—far beyond what rule-based parsing can achieve. But they also have challenges.

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

🚀 Preventing Hallucinations in Vision-Based Data Extraction

Vision models are coming up as the best way to deal with documents with complex layouts. On the flip side, they are more likely to hallucinate results.

How can we address that? With OCR based data validations. 👇

05.12.2024 11:06 👍 0 🔁 0 💬 1 📌 0

It feels like chess engines are so powerful now that they become a bit useless in chess commentary. Even GMs can't make sense of the eval bar sometimes. It would be better maybe to have a more "human" eval bar that actually helps the audience and commentators.

05.12.2024 10:14 👍 2 🔁 0 💬 2 📌 0

I love how chess player assign so much meaning, personality, and purpose to chess pieces throughout games. So much passion and emotions on a board game.

03.12.2024 13:01 👍 2 🔁 0 💬 0 📌 0

Super excited about PydanticAI. Looking forward to taking it out for a spin.

02.12.2024 16:36 👍 1 🔁 1 💬 0 📌 0

That's an interesting question. The dataset I have is not big enough to try that. I suspect that indeed at some point it will start to regress.

02.12.2024 13:26 👍 0 🔁 0 💬 0 📌 0

100%, more so when you have models like Gemini's family in which you can really put A LOT in the context window.

02.12.2024 13:15 👍 1 🔁 0 💬 1 📌 0

If you’re curious about how this approach can work for you, let’s chat!

We’re offering free consulting calls this month to help businesses optimize their AI strategies.

📩 bruno@limai.io or DM me!

02.12.2024 11:46 👍 0 🔁 0 💬 0 📌 0

Check it out here: https://www.limai.io/blog/example

02.12.2024 11:46 👍 1 🔁 0 💬 1 📌 0

In our latests post we break down:
✅ How we built a simple test dataset to evaluate accuracy.
✅ Why adding examples worked so well (and why you should try it).
✅ How this influenced our product's UX/UI strategy.

02.12.2024 11:46 👍 0 🔁 0 💬 1 📌 0

That’s when we tried something so simple it felt obvious in hindsight: we added an example. The results were staggering:
• With a small model plus the example, accuracy leaped from 61% to 97%.
• We achieved this without fine-tuning or complex parsing techniques.

02.12.2024 11:46 👍 0 🔁 0 💬 1 📌 0

Even after a lot of work on prompt engineering and trying out parsing libraries our results were stuck at 61%-80% accuracy—not enough for reliable use.

02.12.2024 11:46 👍 0 🔁 0 💬 1 📌 0

Czech utility bills. These documents had:
🌎 Non-English text (a hurdle for many LLMs)
🧮 Values that needed to be calculated (e.g., summing multiple rows for Heating or Cooling)
🎲 A mix of other fields like dates, addresses, and contracts details

02.12.2024 11:46 👍 1 🔁 0 💬 1 📌 0

While building Limai 's data extraction product, we faced a tough challenge for a proof concept with a potential client: extracting complex data from

02.12.2024 11:46 👍 0 🔁 0 💬 1 📌 0

🚀 [NEW POST] Show, Don’t Tell: How Dynamic Examples Boosted Accuracy from 61% to 97%

Ever spent hours fine-tuning prompts or testing document parsing libraries, only to end up with meh results? What if I told you that one simple change could drastically improve your results?

02.12.2024 11:46 👍 0 🔁 0 💬 1 📌 1

https://arxiv.org/abs/2310.11244

29.11.2024 15:25 👍 0 🔁 0 💬 0 📌 0

Interesting paper on Entity Matching using LLMs. I think I'll work on a demo of this soon.

29.11.2024 15:25 👍 0 🔁 0 💬 1 📌 0

Bruno.

Latest posts by Bruno. @bdagnino.com