Lightnews — Scholar-powered news

Benjamin Feuer

@benjaminfeuer.bsky.social

PhD researcher at NYU, working on LLMs, VLMs, and tabular foundation models from a data-centric perspective. Father of two, NYC diehard.

Posts Replies Media Videos

Benjamin Feuer

@benjaminfeuer.bsky.social

Special thanks to the BlueSky DCVLR crew: @yuhuiz.bsky.social @thaottn.bsky.social @vishaalurao.bsky.social @saining.bsky.social @sarameghanbeery.bsky.social

June 18, 2025 at 2:27 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

Check out:

Our Website: dcvlr-neurips.github.io

Our Starter Kit (Curate, Train, Eval): github.com/oumi-ai/oumi...

🧵 6 / n

DCVLR: Data Curation for Vision Language Reasoning - NeurIPS 2025 Competition

Join the DCVLR NeurIPS 2025 Competition. Advance visual reasoning in VLMs through data curation.

dcvlr-neurips.github.io

June 18, 2025 at 2:22 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

* A submission = a curated reasoning dataset on @huggingface with 1k or 10k samples and a scalable, reproducible curation strategy you document in a write-up
* You don’t need to train a model
* You can submit with nothing more than a free Colab or Kaggle account for basic testing

🧵 5 / n

June 18, 2025 at 2:22 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

💪anyone can compete for free 💪: Thanks to our sponsor @LambdaAPI we offer three free submissions for up to 500 teams. This is unprecedented in data-centric research, which tends to be very expensive because you have to train lots of models!

🧵 4 / n

June 18, 2025 at 2:21 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

🤖 open-models 🤖: every model we present results for will have open weights, and one of those models will be Molmo-O from @allen_ai (a recent best paper honorable mention from @cvpr at #CVPR2025), trained on open data.

🧵 3 / n

June 18, 2025 at 2:20 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

DCVLR is data-centric: we train an ~7B VLM on your dataset. The best performer (on benchmarks like MathVista, VMCBench and LiveXiv) will be eligible to win $1500 and a talk at #NeurIPS2025!

We also have a few twists compared to prior data-centric competitions –

🧵 2 / n

June 18, 2025 at 2:20 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

Co-organizing with wonderful collaborators from MIT, NYU, Stanford and UW: @thaottn.bsky.social , @sewoong79.bsky.social , @sarameghanbeery.bsky.social , @yuhuiz.bsky.social !

May 1, 2025 at 5:04 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

We are excited to be sponsored by @datologyai.com
, who will be providing prizes for best paper awards 🏆

May 1, 2025 at 5:02 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

🚀We welcome any submission that discusses domain-specific data curation pipelines and/or generalizable curation principles, getting us closer to building data-centric methods that are robust, efficient, and adaptable across domains.

Refer to our website for the call for papers!

May 1, 2025 at 5:02 PM

Benjamin Feuer

@benjaminfeuer.bsky.social

That's not what they did, they used gpt-4o for program synthesis, it's fundamentally different than asking the LLM to provide the correct response in the prompt

December 22, 2024 at 11:06 AM

Benjamin Feuer

@benjaminfeuer.bsky.social

Thanks for sharing! FWIW, I sensed mostly optimism and excitement at NeurIPS -- the people I spoke to were eager to talk about their research and learn about mine. Let's meet up in the new year and compare notes @kyunghyuncho.bsky.social

December 22, 2024 at 11:02 AM