Lightnews — Scholar-powered news

Joe Barrow

@jbarrow.bsky.social

110 followers 190 following 23 posts

NLP @ Pattern Data
Prev: Adobe Research, PhD UMD

Posts Replies Media Videos

Joe Barrow

@jbarrow.bsky.social

Paper thread of some work I’m *incredibly* proud of, my first single author paper!

Converting a PDF to a fillable form is a hard problem, and a lot of solutions don’t work very well! In CommonForms, I show that you can train models that outperform Adobe Acrobat for <$500! 🧵

September 24, 2025 at 5:51 PM

Joe Barrow

@jbarrow.bsky.social

In which I argue that LLM-generated bounding boxes are impressive, but not that useful (yet): notes.penpusher.app/Misc/Horsesh...

Horseshoes (and Hand Grenades) - LLM Localization is not Close, but not Close Enough - Joe Barrow

TL;DRLarge Multimodal Models (LMMs) can now output bounding boxes when given images as inputs. The results are impressive, but for documents they aren't good enough for real world use, yet. The Probl…

notes.penpusher.app

April 23, 2025 at 5:33 PM

Joe Barrow

@jbarrow.bsky.social

Ah, yes, that ol' familiar unit of measure "AI TOPS"

January 7, 2025 at 8:13 PM

Joe Barrow

@jbarrow.bsky.social

I put together a little guide on getting started with Google Gemini -- how to make multimodal calls, get structured outputs, and image bounding boxes to build an object detector.

notes.penpusher.app/Misc/Google+...

Google Gemini 101 - Object Detection with Vision and Structured Outputs - Joe Barrow - Obsidian Publish

This is a missing manual for how to get a simple working prototype up and running with Gemini's vision mode and structured outputs. I'm confident that manual exists elsewhere, but I haven't been able…

publish.obsidian.md

December 20, 2024 at 9:51 AM

Reposted by Joe Barrow

Joe Barrow

@jbarrow.bsky.social

Drake meme template.
No to: clear, concise prose
Yes to: negative vspace

December 18, 2024 at 10:37 AM

Joe Barrow

@jbarrow.bsky.social

Gemini 2.0 Flash is pretty good at localization in images. for an LMM (much better than GPT-4o in my experiments).

A picture of a teapot to the right of a teacup, both in a flat-bottomed basket. The teacup has eta and flowers in it. There are 2 blue bounding boxes on the image, one labeled "Teapot" and one labeled "Teacup" that are over the teapot and teacup respectively.

December 18, 2024 at 9:26 AM

Joe Barrow

@jbarrow.bsky.social

ML history question: is there an earlier reference to pixel-only in-context (i.e. no fine-tuning) DocVQA performance than the GPT-4 announcement from OpenAI?

December 9, 2024 at 9:45 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news