j soma
dangerscarf.bsky.social
j soma
@dangerscarf.bsky.social
Natural PDF v0.1.13 out – a handful of useful changes but my favorite is🗼page restructuring support!

Grab sections and "flow" them together vertically or horizontally, making multi-column extraction infinitely easier than 24 hours ago.

Details at jsoma.github.io/natural-pdf/...
June 5, 2025 at 2:02 PM
it looks like someone has been going very hard on scans

ONE MORE DAY OF ACCEPTING BAD PDF SUBMISSIONS
May 16, 2025 at 12:36 PM
Woke up to ton of new non-English BAD PDF CONTEST submissions: 💥 Serbian! Romanian! Chinese! 💥

Mostly not scans, though, so I predict they'll easy-peasy to extract the info from. I want to have to train a custom OCR model!!! Someone submit a big scanned non-English PDF!!!
May 12, 2025 at 12:56 PM
i know you all are hiding worse scans from me
May 11, 2025 at 1:33 PM
i love this giant-pdf-with-tiny-text submission, we need a smallest font size category
May 8, 2025 at 3:24 PM
Live colab demo/walkthrough here: colab.research.google.com/github/jsoma...
April 3, 2025 at 4:16 PM
New release of 📝 Natural PDF 📝

A million and one table extraction/document layout/Q&A/quality of life improvements for all your PDF-processing needs

jsoma.github.io/natural-pdf/
April 3, 2025 at 4:16 PM