Luca
banner
sciencialab.com
Luca
@sciencialab.com
We tried to carry on without the sharp mind of Patrice Lopez, who is currently immersed in new and exciting challenges. And on a lighter note, we might finally have a logo for Grobid!
4/4
December 8, 2025 at 1:00 PM
It also enabled us to better coordinate our roadmap, integrating the benefits of advanced Large Language Models while preserving our commitment to enabling processing on standard consumer-grade hardware.
3/4
December 8, 2025 at 1:00 PM
The meeting was extremely productive and helped us consolidate our community-driven vision for Grobid’s development in the coming years.
2/4
December 8, 2025 at 12:59 PM
On the 26-27 November we held the #Grobid Camp at the Centre de #Inria Paris.
The goal was to have a meeting with the major players in the French community which spaces from government institutes, to companies and large scale projects.
1/4
December 8, 2025 at 12:59 PM
The meeting was extremely productive and helped us consolidate our community-driven vision for Grobid’s development in the coming years.
2/4
December 8, 2025 at 10:00 AM
Me: Please, fix the tests!
LLM Agent: OK
November 1, 2025 at 9:49 PM
The Safari browser is like a car with one gear that claim it does not pollute...
August 24, 2025 at 2:11 PM
Exactly! There is a common misconception that by throwing any kind of crap into a vector it will magically work. Still at the age of AI, metadata information cannot still be ignored.
Vector search is good, but it's not enough.
Metadata adds a layer of logic on top of your text chunks.
This lets you filter by date or source first, then retrieve all the related pieces at once.

#RAG #AI #VectorSearch
July 28, 2025 at 7:25 AM
Reposted by Luca
Yes. The time is now. Vaccines to treat and prevent cancer.
www.jci.org/articles/vie...
July 1, 2025 at 5:57 PM
Your feedback will help us improve Grobid! 🌟 Feel free to share your thoughts, star us on GitHub, and let’s keep building! 💬🚀
May 18, 2025 at 8:27 AM
Next up, we're focusing on supporting more platforms (Linux ARM), improving figures and tables extraction, enhancing CJK language support, and providing better handling for more document types like theses, reports, and more.
🔽
May 18, 2025 at 8:27 AM
- 🔤 Improved recognition of non-standard fonts
- 🛠️ Various bug fixes and security vulnerabilities addressed

github.com/kermitt2/...
🔽
Release 0.8.2 · kermitt2/grobid
What's Changed Added New model specialization/variants (flavors) mechanism #1151 Specialization/variant process for a lightweight processing that covers other types of scientific articles that...
github.com
May 18, 2025 at 8:27 AM
Grobid 0.8.2 is out! 🚀
- 🧠 New processing "flavors" for different doc types (e.g. SDO, corrections, editorials)
- 🔗 Improved URL extraction
- ✅ Better text extraction for paragraphs around figures and tables
🧵🔽
May 18, 2025 at 8:27 AM
I estimate that a few examples for each model would quickly improve the results to an acceptable level.
Feel free to reach out if you are interested, and we can work out a collaboration around it.
May 18, 2025 at 4:58 AM
I'm not sure Grobid is used in any project targetting any of the CJK languages, as other details might need to be addressed.
We started a branch at low-priority (github.com/kermitt2/gro...) to improve CJK languages at once, but other more urgent issues were prioritized at the time.
👇
May 18, 2025 at 4:58 AM
It's interesting to see this analysis, however, to be fair, Grobid does not have any training data for Japanese. This is valid also for Chinese, Korean, etc.
👇
May 18, 2025 at 4:58 AM
Dear @github, I wonder whether it would be possible to have a way to save certain "search parameters" inside the issues/pulls so that our work may be framed to important tasks. E.g. working on a specific milestone and wanting to know everything that is not yet done:
May 11, 2025 at 6:49 AM
May 7, 2025 at 7:30 PM
Reposted by Luca
GROBID by Patrice Lopez turns messy PDFs into well-structured text in TEI format including references- super useful! https://github.com/kermitt2/grobid
GitHub - kermitt2/grobid: A machine learning software for...
A machine learning software for extracting information fr...
github.com
December 1, 2024 at 3:55 PM
Reposted by Luca
To what extent do researchers funded by Dutch Research Council NWO and ZonMw share the research data and code underlying their publications?

Today we published an analysis based on 10.000+ papers using the open source tool Grobid: www.nwo.nl/en/news/shar...

All underlying data openly available!
February 10, 2025 at 9:10 PM
Grobid popularity is still growing, despite LM, LLM, LLLM....
May 6, 2025 at 2:57 PM
Hi, I'm happy to sell my Twitter handler and close up my twitter account, as soon as it's legally allowed 🙂
January 21, 2025 at 8:37 AM
Hallucinating AI? 🫣
December 9, 2024 at 5:02 PM
install Ublock (ublockorigin.com/) and Ghostery (www.ghostery.com/), or both.. they will increase your security and privacy overall.
2/2
uBlock Origin - Free, open-source ad content blocker.
uBlock Origin is not just an “ad blocker“, it's a wide-spectrum content blocker with CPU and memory efficiency as a primary feature. Developed by Raymond Hill.
ublockorigin.com
November 22, 2024 at 8:35 AM