🫵 Follow our substack: https://thehyperplane.substack.com/
👀 Our Ebook: https://hyperplane.gumroad.com/l/fine-tuning-stt-on-edge
open.substack.com/pub/mlvangua...
open.substack.com/pub/mlvangua...
In all realness, code generation is a great assistant for an already great programmer 🤷
In all realness, code generation is a great assistant for an already great programmer 🤷
- Filter out the junk
- Split (70/15/15) & push to @hf.co for easy access during training
2/2
- Filter out the junk
- Split (70/15/15) & push to @hf.co for easy access during training
2/2
A vector database like Vespa store sembeddings and enable allowing similarity searches. They also use metadata to improve relevance by associating vectors with key attributes like document type, page number, or detected visual features.
7/7
A vector database like Vespa store sembeddings and enable allowing similarity searches. They also use metadata to improve relevance by associating vectors with key attributes like document type, page number, or detected visual features.
7/7
Splits documents into manageable chunks for embedding:
- Layout-based chunking is for visual embeddings.
- Text density and structure for traditional embeddings. This preserving context without overloading the vector database
6/7
Splits documents into manageable chunks for embedding:
- Layout-based chunking is for visual embeddings.
- Text density and structure for traditional embeddings. This preserving context without overloading the vector database
6/7
For converting document content into vectors.
- Traditional embeddings for documents with clean text extracted via OCR.
- Vision Language Models (VLM) handle multimodal documents with complex visual structures like tables, charts, and diagrams.
5/7
For converting document content into vectors.
- Traditional embeddings for documents with clean text extracted via OCR.
- Vision Language Models (VLM) handle multimodal documents with complex visual structures like tables, charts, and diagrams.
5/7
The algorithm is centralized, making informed decisions based on input from the embedding decider.
- Text-heavy documents are processed with OCR and text embedding models.
- Documents with complex layouts use visual language models (eg ColPali) instead, skipping OCR.
4/7
The algorithm is centralized, making informed decisions based on input from the embedding decider.
- Text-heavy documents are processed with OCR and text embedding models.
- Documents with complex layouts use visual language models (eg ColPali) instead, skipping OCR.
4/7
This decider analyzes the document's structure, using tools like a layout analyzer, visual element detector, or text density analyzer, to classify whether a traditional text embedding or a multimodal vision embedding is appropriate.
3/7
This decider analyzes the document's structure, using tools like a layout analyzer, visual element detector, or text density analyzer, to classify whether a traditional text embedding or a multimodal vision embedding is appropriate.
3/7
The starting point of any pipeline is the PDF reader. Its job is to extract pages and pass them downstream. A high-quality reader ensures no lost information, whether the content is text-heavy, image-dense, or filled with tables and graphs.
2/7
The starting point of any pipeline is the PDF reader. Its job is to extract pages and pass them downstream. A high-quality reader ensures no lost information, whether the content is text-heavy, image-dense, or filled with tables and graphs.
2/7