Juan Rodriguez
joanrod.bsky.social
Juan Rodriguez
@joanrod.bsky.social
AI Researcher. Working on Multimodal AI at ServiceNow, Mila
joanrod.github.io
We evaluated several VLM models—both open and closed source—on BigDocs-Bench to build a leaderboard.

📊 Models trained on BigDocs outperformed all models on BigDocs-Bench tasks and delivered rebust performance on established benchmarks.
✅ Human evaluations confirmed their strong performance!
December 10, 2024 at 6:34 PM
To validate the quality of the BigDocs datasets, we trained several VLMs on BigDocs-7.5M and evaluated their performance on document-specific and general VLM benchmarks.

The results? Training on BigDocs provides significant boosts compared to training on other datasets! 📈✨
December 10, 2024 at 6:34 PM
We introduce BigDocs-Bench, a set of benchmarks that focus on:

📄 Document Understanding
🌐 Web and GUI reasoning
👨‍💻 Code Generation

We also tackle complex outputs like SVG, LaTeX code, Markdown, and HTML, including very long and structured formats. Here are some examples
December 10, 2024 at 6:34 PM
Building BigDocs was no small feat! We curated a large-scale dataset from diverse, license-friendly sources and documented the entire process.
December 10, 2024 at 6:34 PM
🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!
December 10, 2024 at 6:34 PM