big-red7.bsky.social
@big-red7.bsky.social
However I am rather new to data science and mostly self-taught so I would love to hear how others approach this problem
February 7, 2025 at 11:42 AM
I get it is easy to create a pipeline for digitized forms if you have an idea of the structure before you receive it but I feel like having a tool that could take any pdf and extract a table that contains a given search term to a json without understanding the structure is very useful
February 7, 2025 at 11:41 AM
Like I understand that other tools exist however the issue with pdfs is they often have unique structure and are intended to be interpreted by other people and I personally believe LLMs are usually pretty effective at taking information in one format and and converting it to a more standard one
February 7, 2025 at 11:22 AM
I am not going to speak to the quality of what the original post is saying however as someone who works in an environment where scraping data from large amounts of arbitrary pdfs/digitized forms is rather challenging I think it definitely is a potential use case where LLMs could prove effective
February 7, 2025 at 11:18 AM