Joe Barrow
jbarrow.bsky.social
Joe Barrow
@jbarrow.bsky.social
NLP @ Pattern Data
Prev: Adobe Research, PhD UMD
And I still haven’t learned what Skibidi means. 😡
October 16, 2025 at 4:28 PM
Now, some acknowledgments: this work was made possible thanks to a generous compute grant from Lambda!

And I've got a hosted version of the model that I'll be sharing in a couple days hosted on @modal-labs.bsky.social, which makes it basically free for me to host and scale
September 24, 2025 at 5:51 PM
As part of the paper, I'm working on releasing the dataset and FFDNet models on HuggingFace.

Those will be out in the coming days, you can follow along here: github.com/jbarrow/comm...

🤗Paper: huggingface.co/papers/2509....
arXiv: arxiv.org/abs/2509.16506
GitHub - jbarrow/commonforms: CommonForms dataset and models
CommonForms dataset and models. Contribute to jbarrow/commonforms development by creating an account on GitHub.
github.com
September 24, 2025 at 5:51 PM
Now, just because we filtered for the cleanest forms doesn't mean we got _perfect_ forms. There are still a lot of inconsistencies in how people prepare forms! In future work I'll be looking at mitigating data quality issues like these.
September 24, 2025 at 5:51 PM
(Note, this doesn't _just_ apply to Acrobat, it's also better than Apple Preview -- neither Acrobat nor Preview even make an attempt at checkboxes, and they're often fooled by any straight, horizontal line. Left: Acrobat, Right: FFDNet)
September 24, 2025 at 5:51 PM
If we train object detectors to find the form fields on these pages, we get a much cleaner set of forms than if you used Acrobat to automatically prepare your form. (Left: Acrobat, Right: FFDNet).
September 24, 2025 at 5:51 PM
Step 1 is to filter out for the cleanest forms possible. We start with 8MM PDFs from Common Crawl, and work our way down to ~60k of the cleanest forms we can find. The results is a ~500k page dataset, called CommonForms.
September 24, 2025 at 5:51 PM
Yeah I wonder if that statistic is flipped between the cities (though operated by the same provider — Lyft — I assume?)

No way that 99 out of every 100 riders in Boston have visited more than 27 stations?
May 31, 2025 at 5:07 AM
Pretty sure you want that number to be lower. :p (my stats for DC ridership)
May 31, 2025 at 5:04 AM
Would absolutely love that!
March 11, 2025 at 12:28 PM
"AI TOPS our stock price"
- Nvidia, today
January 7, 2025 at 9:35 PM
Agree, my ideal would be if you could type into an old, cheap, refurb kindle personally.

Here’s a video of a person typing into the Palma: www.reddit.com/r/Onyx_Boox/...

My experience (tablet) is that it’s maybe 10s from pickup to writing — wake up (3s), navigate to apps (2s), open app (3-5s)?
Palma as a FreeWrite
www.reddit.com
December 22, 2024 at 8:56 AM
I’ve got one of the older, larger eInk tablets and use it for reading books/papers and taking notes. Battery after several years lasts about a week of average use, longer if I keep WiFi off.
December 21, 2024 at 9:29 PM
Not necessarily hitting the price point but there are eInk mini tablets (e.g. Boox Palma at around $200) that have Android, no sim (so no phone distractions), and long battery life (thanks to the eInk and being generally underpowered). They accept keyboards, too.
December 21, 2024 at 9:27 PM
Reposted by Joe Barrow
December 18, 2024 at 10:37 AM
December 18, 2024 at 10:37 AM
Holy moly that created an extra half page of space!
December 18, 2024 at 10:09 AM
Aged white tea, the kind that comes in a compressed disk or ball. My favorite kind of tea, imo they taste naturally quite sweet. Yunnan Sourcing has a bunch!
November 25, 2024 at 5:58 AM