2. Fails most of the time with multicolumn PDFs
3. Fails with complex infographics
4. Fails with equations
That said, for a regular single-column research paper, it's very good.
2. Fails most of the time with multicolumn PDFs
3. Fails with complex infographics
4. Fails with equations
That said, for a regular single-column research paper, it's very good.
pymupdf.readthedocs.io/en/latest/
pymupdf.readthedocs.io/en/latest/
Still working on the other pet project - DocOwl.
Still working on the other pet project - DocOwl.
I'm currently trying out something completely different:
github.com/X-PLUG/mPLUG...
I'll first try your recommended approach. If that works for fitz, it'll save me a ton of time and new explorations.
I'm currently trying out something completely different:
github.com/X-PLUG/mPLUG...
I'll first try your recommended approach. If that works for fitz, it'll save me a ton of time and new explorations.
I've found that it ruins the structure of research documents.
I've found that it ruins the structure of research documents.