Lightnews — Scholar-powered news

@big-red7.bsky.social

7 followers 16 following 4 posts

Posts Replies Media Videos

big-red7.bsky.social

@big-red7.bsky.social

However I am rather new to data science and mostly self-taught so I would love to hear how others approach this problem

February 7, 2025 at 11:42 AM

big-red7.bsky.social

@big-red7.bsky.social

I get it is easy to create a pipeline for digitized forms if you have an idea of the structure before you receive it but I feel like having a tool that could take any pdf and extract a table that contains a given search term to a json without understanding the structure is very useful

February 7, 2025 at 11:41 AM

big-red7.bsky.social

@big-red7.bsky.social

Like I understand that other tools exist however the issue with pdfs is they often have unique structure and are intended to be interpreted by other people and I personally believe LLMs are usually pretty effective at taking information in one format and and converting it to a more standard one

February 7, 2025 at 11:22 AM

big-red7.bsky.social

@big-red7.bsky.social

I am not going to speak to the quality of what the original post is saying however as someone who works in an environment where scraping data from large amounts of arbitrary pdfs/digitized forms is rather challenging I think it definitely is a potential use case where LLMs could prove effective

February 7, 2025 at 11:18 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news