Has saved me a ton of effort.
Has saved me a ton of effort.
Validation libs on the API layer keeps it sane on server.
Client side with frameworks in place it's not difficult to enforce consistency.
We did try TS but never saw the point
Validation libs on the API layer keeps it sane on server.
Client side with frameworks in place it's not difficult to enforce consistency.
We did try TS but never saw the point
Considering Full text search of MongoDB as well.
So search would be on text,vectors and graphs.
Given the unstructured nature of the data we think multiple query mechanisms would be a good approach.
More thoughts later as we discover things.
(6/6)
Considering Full text search of MongoDB as well.
So search would be on text,vectors and graphs.
Given the unstructured nature of the data we think multiple query mechanisms would be a good approach.
More thoughts later as we discover things.
(6/6)
Use the relationships and entities to populate the GraphDB.
We are going with Neo4j for now. We could move it to weaviate but right now the thought process is we will require a full featured GraphDB.
Both Weaviate and Neo4j store the emailId generated by MongoDB to ensure traceability.
(5/n)
Use the relationships and entities to populate the GraphDB.
We are going with Neo4j for now. We could move it to weaviate but right now the thought process is we will require a full featured GraphDB.
Both Weaviate and Neo4j store the emailId generated by MongoDB to ensure traceability.
(5/n)
Create Vector embedings from Mongo Data (email and attachment text) and store in Weaviate.
Again lot of choices in chunking and embedding models. Weaviate documentation is helpful in understanding the choices. More on the choices to be made later.
(4/n)
Create Vector embedings from Mongo Data (email and attachment text) and store in Weaviate.
Again lot of choices in chunking and embedding models. Weaviate documentation is helpful in understanding the choices. More on the choices to be made later.
(4/n)
Use Spacy/Rebel pipeline to extract
1. Entities
2. Relationships
Experimenting with different entity and relationship extraction models. Will publish the findings later.
Update Mongo with the extracted text, entities,relationships, and tables.
(3/n)
Use Spacy/Rebel pipeline to extract
1. Entities
2. Relationships
Experimenting with different entity and relationship extraction models. Will publish the findings later.
Update Mongo with the extracted text, entities,relationships, and tables.
(3/n)
For each attachment we use Docling/Spacy pipeline to extract
1. Text
2. Tables
3. Images (planned)
Right now converting xls to pdf and extracting tables using spacy layout seems to be working well.
(2/n)
For each attachment we use Docling/Spacy pipeline to extract
1. Text
2. Tables
3. Images (planned)
Right now converting xls to pdf and extracting tables using spacy layout seems to be working well.
(2/n)