Daniel van Strien
@danielvanstrien.bsky.social
Machine Learning Librarian at @hf.co
demo app here: huggingface.co/spaces/akhal...
Nanonets-OCR2-3B - a Hugging Face Space by akhaliq
Discover amazing ML apps made by the community
huggingface.co
October 23, 2025 at 2:01 PM
demo app here: huggingface.co/spaces/akhal...
huggingface.co/nanonets/Nan... might be worth a try for this. Can extract formulas into LaTeX
October 23, 2025 at 2:01 PM
huggingface.co/nanonets/Nan... might be worth a try for this. Can extract formulas into LaTeX
The command (using @hf.co Jobs - serverless GPU compute)
Full script at huggingface.co/datasets/uv-...
Full script at huggingface.co/datasets/uv-...
October 22, 2025 at 7:20 PM
The command (using @hf.co Jobs - serverless GPU compute)
Full script at huggingface.co/datasets/uv-...
Full script at huggingface.co/datasets/uv-...
Anyone who says OCR is a solved problem has not worked with historic digitised newspapers!
October 20, 2025 at 12:40 PM
Anyone who says OCR is a solved problem has not worked with historic digitised newspapers!
October 13, 2025 at 6:13 PM
@wjbmattingly.bsky.social has done tons on handwritten text and VLMs!
October 8, 2025 at 5:39 PM
@wjbmattingly.bsky.social has done tons on handwritten text and VLMs!
Built with uv for zero setup
Example output from historical library catalog: huggingface.co/datasets/dav...
Input dataset:
huggingface.co/datasets/big...
100+ languages supported!
Example output from historical library catalog: huggingface.co/datasets/dav...
Input dataset:
huggingface.co/datasets/big...
100+ languages supported!
davanstrien/dots-ocr-bpl-card-catalog · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
October 7, 2025 at 3:45 PM
Built with uv for zero setup
Example output from historical library catalog: huggingface.co/datasets/dav...
Input dataset:
huggingface.co/datasets/big...
100+ languages supported!
Example output from historical library catalog: huggingface.co/datasets/dav...
Input dataset:
huggingface.co/datasets/big...
100+ languages supported!
Also uploaded related datasets for index cards bsky.app/profile/dani...
Card catalogues aren't just a relic of the past - many institutions still rely on them because full migration is too expensive. VLMs could help change that.
I uploaded two new @hf.co datasets (~470K cards) for training/evaluating models to extract structured metadata from catalogue cards.
I uploaded two new @hf.co datasets (~470K cards) for training/evaluating models to extract structured metadata from catalogue cards.
October 6, 2025 at 9:31 AM
Also uploaded related datasets for index cards bsky.app/profile/dani...
Let me know if you think it's good to add any more context about that in the dataset card!
October 2, 2025 at 7:10 PM
Let me know if you think it's good to add any more context about that in the dataset card!