Dario
banner
dariogargas.bsky.social
Dario
@dariogargas.bsky.social
Senior AI researcher at BSC. Random thinker at home.
Pinned
📌
Are you into Chip Design, EDA or just like to do RTL code for fun? Check out the largest benchmarking of LLMs for Verilog generation: TuRTLe 🐢

It includes 40 open LLMs evaluated on 4 benchmarks, following 5 tasks. And its only growing!

huggingface.co/spaces/HPAI-...

arxiv.org/abs/2504.01986
June 3, 2025 at 4:08 PM
So many healthcare LLMs, and yet so little information! Check out this table summarizing contributions, and find more details in our latest pre-print: arxiv.org/abs/2505.04388
May 22, 2025 at 12:19 PM
The Aloe Beta preprint includes full details on data & training setup.
Plus four different evaluation methods (including medical expert).
Plus a risk assessment of healthcare LLMs.

Two years of work condensed in a few pages, figures and tables.

Love open research!
huggingface.co/papers/2505....
May 21, 2025 at 8:06 AM
We just opened two MLOps Engineer positions at @bsc-cns.bsky.social

Our active and young research team needs someone to help sustain and improve our services, including HPC clusters, automated pipelines, artifact managements and much more!

Are you up for the challenge?
www.bsc.es/join-us/job-...
350_25_CS_AIR_RE2
Reference: 350_25_CS_AIR_RE2 Job title: Research Engineer - AI Factory (RE2) About BSC The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-
www.bsc.es
May 6, 2025 at 5:03 PM
Last week our team presented this at NAACL. Check out the beautiful poster they put together 😍
May 6, 2025 at 4:39 PM
Working on a project for evaluating embryo quality using in-vitro fertilization data.

A random forest using morphokinetic features of embryo evolution visually annotated by experts, and a CNN directly using static images get similar performance. Separately AND together.

I find it surprising...
April 11, 2025 at 2:35 PM
There are quite a lot of researchers who a so preoccupied with whether or not they could get the funding, they don't stop to think if they should.

Being chased by dinosaurs and writing grants. Same thing.
April 9, 2025 at 8:32 AM
How expensive 🫰 is it to get the best LLM performance? How much cash needs to burn 💸 to get reliable responses? Pareto optimal plots answer these questions.

Our research shows it is economically feasible and scalable to achieve O1 level performance at a fraction of the cost.
buff.ly/ji1VHiV
April 4, 2025 at 2:35 PM
Our LLM safety project, Egida, reached 2K downloads 😀
It includes +60K safety questions expanded with jailbreaking prompts.
The four models trained (and released) show strong signs of safety alignment and generalization capacity. Check out the 🤗 HF page and the paper for details!
buff.ly/kxFVyl2
April 1, 2025 at 9:11 PM
Today we release the TuRTLe leaderboard! 🐢

Are you in the Chip Design or EDA business? Wanna know which LLMs are best for the task? By integrating 4 benchmarks, TuRTLe evaluates:

* Syntax
* Functionality
* Synthesizability
* Power, Performance and Area metrics

huggingface.co/spaces/HPAI-...
TuRTLe Leaderboard - a Hugging Face Space by HPAI-BSC
A Unified Evaluation of LLMs for RTL Generation.
huggingface.co
April 1, 2025 at 2:15 PM
MIR is Spain's medical entrance exam. Best students reach an estimate accuracy of +90. Two or three every year.
We took MIR, '20-'24 to test open LLMs. Llama 3.1 based models, like Aloe, reach +80 in accuracy.
Deepseep R1 reaches +88. Boosted by a RAG system, 92.
buff.ly/4bbbXMw
buff.ly/4hLrhBV
HPAI-BSC/CareQA · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
buff.ly
March 3, 2025 at 3:35 PM
After listening to the latest @fallofcivspod.bsky.social episode about the Mongolian Empire, by @paulcooper34.bsky.social , I realized Mongols and the Fremen from Dune share remarkable similarities.
Skilled warriors adapted to harsh environments, taking over a society they don't want to adopt.
February 28, 2025 at 8:08 AM
Human evaluation of LLMs is close to saturation. Models have been optimized so much for plausibility, that we are unable to tell good from bad. Only experts in expert domains can see a meaningful difference.
February 21, 2025 at 5:41 PM
After a year working on LLM evaluation, our benchmarking paper is finally out (to be presented at NAACL 2025). Main lessons:
* All LLM evals are wrong, some are slightly useful.
* Goodhart's law. All the time. Everywhere.
* Do lots of different evals and hope for the best.
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factuality…
buff.ly
February 21, 2025 at 3:35 PM
Evaluating LLMs is a bit like paleontology. Trying to understand the behavior of very complex entities by observing only noisy and partial evidence. How do paleontologists deal with the uncertainty and frustration? Do they also feel like doing alchemy instead of science?
February 21, 2025 at 10:49 AM
Wisdom from my 6y old daughter: "A king is a just person disguised as king."
February 19, 2025 at 7:08 PM
Over and over again I keep finding @sarahooker.bsky.social papers to reference. This time about ELO rankings. She's always 2-3 years ahead...
February 19, 2025 at 3:12 PM
So many keywords around LLM training, its easy to get lost.
For an incoming paper, did this little visual summary. Would you change anything?
February 18, 2025 at 5:48 PM
5th International Workshop on Computational Aspects of Deep Learning (CADL) to be held in conjunction with ISC-HPC 2025.

10 days to go, and an award to be decided!

Submit your paper and join us sites.google.com/view/cadl2025/
CADL 2025
Advancing AI Through Efficient Computing Over the past decade, Deep Learning (DL) has revolutionized numerous research fields, transforming AI into a computational science where massive models are tra...
sites.google.com
February 17, 2025 at 9:06 AM
Only two weeks until the deadline!
Submit your paper and see you in Germany :)
At ISC High Performance 2025, I'll be co-organizing the 5th International Workshop on Computational Aspects of Deep Learning (CADL). See: buff.ly/40qiDS4

Deadline: 28 Feb
Topics:
-Energy-efficient AI
-Large-scale pre-training
-Distributed learning approaches
-Model optimization strategies
CADL 2025
Advancing AI Through Efficient Computing Over the past decade, Deep Learning (DL) has revolutionized numerous research fields, transforming AI into a computational science where massive models are tra...
buff.ly
February 12, 2025 at 5:12 PM
Bring it on. Totally prepared for another lockdown.
February 10, 2025 at 8:08 PM
Tired of not having enough time for reading cause all the writing I have to do.
February 3, 2025 at 8:59 PM
Historical quotes from deep learning:

"Don't be a hero"
Andrej Karpathy

"Attention is all you need"
Vaswani et al

"We Have No Moat
And neither does OpenAI"
Google Engineer
January 30, 2025 at 8:02 PM
Trying to put some order in LLM keywords for an incoming paper. Green concepts are in a different axis, and only partly overlap with elements in blue.
January 30, 2025 at 11:18 AM
AI researchers today, feeling the If poem:

"If you can bear to hear the truth you’ve spoken
Twisted by knaves to make a trap for fools,
Or watch the things you gave your life to, broken,
And stoop and build ’em up with worn-out tools"
January 29, 2025 at 8:22 AM