Lightnews — Scholar-powered news

Juan Rodriguez

@joanrod.bsky.social

24 followers 24 following 9 posts

AI Researcher. Working on Multimodal AI at ServiceNow, Mila
joanrod.github.io

Posts Replies Media Videos

Juan Rodriguez

@joanrod.bsky.social

We evaluated several VLM models—both open and closed source—on BigDocs-Bench to build a leaderboard.

📊 Models trained on BigDocs outperformed all models on BigDocs-Bench tasks and delivered rebust performance on established benchmarks.
✅ Human evaluations confirmed their strong performance!

December 10, 2024 at 6:34 PM

Juan Rodriguez

@joanrod.bsky.social

To validate the quality of the BigDocs datasets, we trained several VLMs on BigDocs-7.5M and evaluated their performance on document-specific and general VLM benchmarks.

The results? Training on BigDocs provides significant boosts compared to training on other datasets! 📈✨

December 10, 2024 at 6:34 PM

Juan Rodriguez

@joanrod.bsky.social

We introduce BigDocs-Bench, a set of benchmarks that focus on:

📄 Document Understanding
🌐 Web and GUI reasoning
👨‍💻 Code Generation

We also tackle complex outputs like SVG, LaTeX code, Markdown, and HTML, including very long and structured formats. Here are some examples

December 10, 2024 at 6:34 PM

Juan Rodriguez

@joanrod.bsky.social

Building BigDocs was no small feat! We curated a large-scale dataset from diverse, license-friendly sources and documented the entire process.

December 10, 2024 at 6:34 PM

Juan Rodriguez

@joanrod.bsky.social

🎉 Excited to introduce BigDocs!
An open, transparent multimodal dataset designed for:
📄 Documents
🌐 Web content
🖥️ GUI understanding
👨‍💻 Code generation from images
We’re also launching BigDocs-Bench:
➡️ Document, Web, GUI Visual reasoning
➡️ Converting images into JSON, Markdown, LaTeX, SVG, and more!

December 10, 2024 at 6:34 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news