Arthur
arthurbmello.bsky.social
Arthur
@arthurbmello.bsky.social
Data scientist / AI Engineer interested in causality.
BJJ black belt.
Views are not my own.
https://arthurmello.ai/
If your business depends on reliable, structured outputs (legal, medical, finance, code) RFT means faster, cheaper, and more controllable fine-tuning.

📚 More here: platform.openai.com/docs/guides/...
OpenAI Platform
Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.
platform.openai.com
June 9, 2025 at 7:09 PM
Early results are impressive:
• Milo boosted accuracy in complex scheduling by 25 points.
• Accordance AI outperformed GPT-4 in tax reasoning.
• Ambience Healthcare beat physician baselines on ICD-10 coding.
June 9, 2025 at 7:09 PM
You define what “good” looks like, and the AI iteratively improves.

This makes it possible to align LLMs with your company’s tone, structure, or compliance needs using minimal supervision.
June 9, 2025 at 7:09 PM
It can also be very useful for building vector databases for RAG. Dedoc can be utilized as a Python library, a standalone API service, or via Docker, offering flexibility for various integration needs.

github.com/ispras/dedoc
GitHub - ispras/dedoc: Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information ...
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electro...
github.com
May 28, 2025 at 4:33 PM
With Dedoc, they can automate the extraction of key information (ex.: clauses, parties involved, and dates) from these documents, streamlining their workflow and reducing manual data entry.
May 28, 2025 at 4:33 PM
It extracts textual content, logical hierarchies, tables and metadata, representing the document’s structure as a tree for easy processing.

Imagine a legal firm needing to digitize and analyze a vast archive of contracts in different formats.
May 28, 2025 at 4:33 PM
That makes it look like the two are opposites, even if they aren’t.

It’s a good reminder: if you’re only seeing part of the picture, the relationships in your data might be totally wrong.
May 27, 2025 at 5:56 PM
It happens when we only look at a filtered group and get a misleading pattern.

In this case, the hospital mostly admits people with either condition.

So if someone doesn’t have diabetes, they probably have high blood pressure, and vice versa.
May 27, 2025 at 5:56 PM
Appending the invented documents to the query helps the matching with real documents.

It’s a plug-and-play way to boost retrieval.

It made BM25’s top search hits 15% more relevant on benchmarks.

Tradeoff: slower and pricier.

But effective.

arxiv.org/pdf/2303.07678
arxiv.org
May 22, 2025 at 4:43 PM
This isn’t just “simpler.”

It’s a cleaner way to combine LLMs and vector search.

And it actually works.
May 15, 2025 at 4:55 AM
Use the agent like a brainy assistant that understands what you mean and queries your vector DB accordingly.

No need to manually wrangle metadata queries.

Just describe what you want and the agent does the rest.
May 15, 2025 at 4:55 AM