Maria Khalusova
banner
mariak.bsky.social
Maria Khalusova
@mariak.bsky.social
Always growing, she/her, RAG builder, LLM whisperer, tech generalist
A tiny bit of mirroring? :)
June 13, 2025 at 6:24 PM
PS: That said, I’ll probably still keep an eye on what’s happening and may even share some posts every now and then. I’ve got a lot of thoughts on RAG, data processing, LLMs/VLMs, etc., so I likely won’t disappear fully.
June 13, 2025 at 4:57 PM
The work will still be here when I return. The AI won’t slow down, but also a couple of months won’t make a dent in the field. This moment, however, this chance to be fully present with my family? That’s something I don’t want to miss.
June 13, 2025 at 4:57 PM
And even more grateful to work with a team that’s so supportive. Stepping away from work, especially in a field moving at warp speed, can feel counterintuitive. But for me, it’s a way to reconnect with what matters most.
June 13, 2025 at 4:57 PM
Kids won’t be kids forever, and mine are getting ever so close to becoming teenagers. Now is time I know I’ll never get back.
I’m incredibly grateful to be in a place, both professionally and personally, where this is possible.
June 13, 2025 at 4:57 PM
Next week, I’m stepping away for a couple of months to take a sabbatical and spend time with my kids. I’m not burnt out. I’m following my own advice: do the thing you’ll regret not doing when you’re old.
June 13, 2025 at 4:57 PM
RAG exists to solve different problems across varied domains. Understand the problem you’re solving and look at your data.
June 12, 2025 at 7:07 PM
Once you have some answers to these, you can get further into the technical weeds and experiment with chunking to find an optimal size.
Bottom line, however, is - there's no universal "best" chunk size.
June 12, 2025 at 7:07 PM
* How much context do you typically need to retrieve to satisfy a typical query? Simple facts may only require a sentence or two. Creative tasks may require larger context. Analytical queries may need a whole bunch of supporting evidence.
June 12, 2025 at 7:07 PM
They all vary in structure, style, and length.
* What is your use case? Are you trying to answer questions with specific facts? Are you gathering multiple documents to summarize for a report? Do you pull from transcripts and need to preserve speaker attribution?
June 12, 2025 at 7:07 PM
Same goes for chunking. The “best” chunk size depends on a range of factors, and without those, the question is incomplete.
Here are some of the questions to ask instead:
* What does your data look like? Financial statements, technical manuals, customer support transcripts are not the same.
June 12, 2025 at 7:07 PM
At least I have interrupted your doomscrolling with some cuteness!
May 20, 2025 at 7:50 PM