Karolus Sariola
banner
ksariola.bsky.social
Karolus Sariola
@ksariola.bsky.social
SLM, evaluation, inference. Based in Helsinki.
https://flow-ai.com
Pinned
Hi! I'm Karolus. Nice to meet you! I create evaluation strategies for LM systems for a living. I am a co-founder in a skilled group of 5 engineers known as Flow AI 🇫🇮 🇪🇸 🇵🇰. Besides customer work, we regularly release small and capable open-source evaluator models for the public to advance the field.
Reposted by Karolus Sariola
Amazingly informative.
July 13, 2025 at 2:16 AM
“Comprehensive” is the new “delve”
June 19, 2025 at 1:57 PM
Reposted by Karolus Sariola
Learn how Remote become a unicorn in two years and grew from zero to a team of more than 100 Elixir engineers: elixir-lang.org/blog/2025/01...
Remote: growing from zero to unicorn with Elixir
A case study of how Elixir is being used at Remote.
elixir-lang.org
January 21, 2025 at 4:16 PM
Reposted by Karolus Sariola
Did you know that attention across the whole input span was inspired by the time-negating alien language in Arrival? Crazy anecdote from the latest Hard Fork podcast (by @kevinroose.com and @caseynewton.bsky.social). HT nwbrownboi on Threads for the lead.
December 1, 2024 at 2:50 PM
November 29, 2024 at 10:01 PM
how cool is this cover
November 29, 2024 at 6:57 PM
Two Apes Incapable of Understanding the Mystery of the Monolith

[Fischli/Weiss, Fondazione Prada]
November 27, 2024 at 9:00 PM
Reposted by Karolus Sariola
Medically adapted foundation models (think Med-*) turn out to be more hot air than hot stuff. Correcting for fatal flaws in evaluation, the current crop are no better on balance than generic foundation models, even on the very tasks for which benefits are claimed.
arxiv.org/abs/2411.04118
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pret...
arxiv.org
November 26, 2024 at 6:12 PM
Reposted by Karolus Sariola
Knowledge about what works for creating data pipelines for #LLM pertaining datasets is increasingly being shared more openly.

This paper goes a step further by focusing on reducing the compute required to build a dataset and train an LLM for a low-resource language.
huggingface.co/papers/2411....
Paper page - UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages
Join the discussion on this paper page
huggingface.co
November 22, 2024 at 11:03 AM
Reposted by Karolus Sariola
Let's Build a Simple Database

Writing a sqlite clone from scratch in C

cstack.github.io/db_tutorial/
November 20, 2024 at 5:54 AM
Reposted by Karolus Sariola
I will be at #EMNLP2024 presenting our work on "Extrinsic Evaluation of Cultural Competence in Large Language Models" in Poster Session 12 on Thursday 2-3:30 PM.

In this work we take the first steps towards asking whether LLMs can cater to diverse cultures in *user-facing generative* tasks.

[1/7]
November 9, 2024 at 5:24 PM
Reposted by Karolus Sariola
A finding from Text REtrieval Conference (TREC) 2024, a gold standard in information retrieval.

"LLM-as-a-judge" can replace fully manual judgments to accurately capture run-level effectiveness. It also does not appear to increase correlation with fully manual assessments.
November 14, 2024 at 6:24 AM
Note to self to make tteokbokki 🇰🇷🌶️ more often!
November 12, 2024 at 5:03 AM
Do it right
November 12, 2024 at 4:58 AM
Pretty dark in Helsinki pre-snow on a November.. but in my mind's eye I am back at our Portugal trip
November 10, 2024 at 5:18 PM
November 10, 2024 at 5:14 PM
Hi! I'm Karolus. Nice to meet you! I create evaluation strategies for LM systems for a living. I am a co-founder in a skilled group of 5 engineers known as Flow AI 🇫🇮 🇪🇸 🇵🇰. Besides customer work, we regularly release small and capable open-source evaluator models for the public to advance the field.
November 10, 2024 at 5:00 PM
Summer throwback
November 10, 2024 at 4:30 PM
Applies not only to engineers
November 10, 2024 at 4:25 PM
Reposted by Karolus Sariola
LLM Prompt Tuning Playbook

This document is for anyone who would like to get better at prompting post-trained LLMs. We assume that readers have had some basic interactions with some sort of LLM (e.g. Gemini), but we do not assume a rigorous technical understanding.

github.com/varungodbole...
November 9, 2024 at 3:22 PM
Reposted by Karolus Sariola
New here? Interested in AI/ML? Check out these great starter packs!

AI: go.bsky.app/SipA7it
RL: go.bsky.app/3WPHcHg
Women in AI: go.bsky.app/LaGDpqg
NLP: go.bsky.app/SngwGeS
AI and news: go.bsky.app/5sFqVNS

You can also search all starter packs here: blueskydirectory.com/starter-pack...
November 9, 2024 at 9:13 AM
Peak Nokia era Finland was a banger. Convince me otherwise.
October 28, 2024 at 4:25 PM
Reposted by Karolus Sariola
This is a pipeline we're seeing again and again for curating synthetic data for specific domains. You need:
1. Diversity,
2. Quality responses, and
3. Verification.

AI-Assisted Generation of Difficult Math Questions
Shah et al.

When you do this stuff, plz release the data ;) - "plan to release"...
October 21, 2024 at 1:16 PM