Mathys Grapotte
grapottem.bsky.social
Mathys Grapotte
@grapottem.bsky.social
currently building a test bench for dl models in genomics at https://github.com/nf-core/deepmodeloptim. PostDoc at @ CRG Barcelona
Funny thing is that if you type TRUE + !TRUE you get the proper answer (1).
December 15, 2024 at 2:55 PM
here is the talk if you are curious
Mathys Grapotte: STIMULUS : A nextflow-based pipeline for training deep learning models
YouTube video by Nextflow
youtu.be
November 29, 2024 at 7:37 PM
thanks Ian !
November 23, 2024 at 7:13 PM
So no stable release yet (still in dev), we are trying to make this extremely easy to contribute, add stuff to, we have an active slack channel, open dev hours every Wednesday from 2pm to 6 pm CET, we participate to all nf-core hackathons, so a big community effort and super easy to join in !
November 23, 2024 at 6:07 PM
Happy to discuss this further :) maybe a slack/discord channel or zoom talk etc.

(Also, we are trying to build such framework already for a while at github.com/nf-core/deep... , currently pipeline is moving a lot because we are in the process of porting of our code to nf-core)
GitHub - nf-core/deepmodeloptim: Stochastic Testing and Input Manipulation for Unbiased Learning Systems
Stochastic Testing and Input Manipulation for Unbiased Learning Systems - nf-core/deepmodeloptim
github.com
November 23, 2024 at 6:06 PM
I think @nextflow.io and @nf-co.re is the best place to build this at because it has all the qualities (open source, large community already - 8k developers on slack,, performant, easy to use etc.). and already has all the bio software we need to process raw data (mappers, aligners, etc.)
nf-core in numbers
Measuring activity across the nf-core community
nf-co.re
November 23, 2024 at 6:05 PM
If framework is performant, easy to use, flexible, easy to contribute to, easy to understand, good looking etc. I bet proper guidelines will come naturally (and will vary based on use cases, which has always been in software).
November 23, 2024 at 6:04 PM
Such a framework shouldn’t impose guidelines on users, only provide a convenient way to run all kinds of tests on a research prototype (so different from the models folks use in clinic applications).
November 23, 2024 at 6:04 PM
So I think the best solution will come in the form of a robust test framework (analog to pytest in python for instance), which could do both unittests (theory tests on architecture + training “soundness” ) and integration tests (downstream tests).
November 23, 2024 at 6:04 PM
I also disagree with these takes, seeing Meta’s talk at the ray summit for instance, I think those companies have robust stat eval pipelines in place (one does not press a $100m button without knowing what will come out of it).
A gold star for effort I guess, but falls far short of how experts have already said ML based predictions in medicine should be evaluated. cc @maartenvsmeden.bsky.social

An important hint is that there are certain ways to evaluate predictions regardless of the methods used to generate them.
www.anthropic.com/research/sta...

This is an excellent attempt (blog & paper) at bringing more statistical rigor to evaluation of ML models (this is specifically focused on LLM evals).

I feel like we need to have similar clear standards for many types of predictive models in biology. 1/
November 23, 2024 at 6:03 PM
However (probably an unpopular opinion), I think that a paper is not a good enough vessel for those “guidelines”. I agree with the OP’s point here even though I think the guidelines of that paper are quite superficial.
November 23, 2024 at 6:03 PM
The principles of Deep Learning theory book is also a test gold mine. Generally, I think NTK is a good probabilistic framework for designing evaluations of DL models.
The Principles of Deep Learning Theory
This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to d...
arxiv.org
November 23, 2024 at 6:02 PM
Or here cosine similarity between sorted singular vectors detecting structural shift after fine tuning.
arxiv.org
November 23, 2024 at 6:02 PM
Thankfully, with theory making progress, there are many things we can test prior and during training, for example here : evolution of matrix rank as a proxy for information quantity in layers
Inheritune: Training Smaller Yet More Attentive Language Models
Large Language Models (LLMs) have achieved remarkable performance across various natural language processing tasks, primarily due to the transformer architecture and its self-attention mechanism. Howe...
arxiv.org
November 23, 2024 at 6:02 PM
So, I think we should have a “software” approach i.e. deep learning code being vetted before or as it is running.. This could still be useful because training from scratch on a couple of batches might be enough to detect issues (and cost effective).
November 23, 2024 at 6:01 PM
However, downstream tests do not pinpoint issues well enough (i.e. is the training data the issue? is it the code? instability?...)
November 23, 2024 at 6:01 PM
First from this post and @cwognum.bsky.social , @ihaque.bsky.social :
I agree that downstream test development is useful, and extra convenient. (no need to retrain, can think of model as black box, gives interesting bio insights etc.)
If you're interested in a discussion of statistical methodology for comparison of BioML tools, use the #BioMLeval hashtag. This paper linked by Anshul is a great starting point for discussion. It's oriented towards LLMs, but some ideas may be transferable. What benchmarks do you use for BioML?
www.anthropic.com/research/sta...

This is an excellent attempt (blog & paper) at bringing more statistical rigor to evaluation of ML models (this is specifically focused on LLM evals).

I feel like we need to have similar clear standards for many types of predictive models in biology. 1/
November 23, 2024 at 6:01 PM
I am very interested, actually at CRG and within the @nf-co.re organisation, we are building an open source framework that will have all those tests built in (it's in my bio). For that purpose, I collected many papers from various ML fields and would love to share/discuss
November 22, 2024 at 8:51 AM
There are many more intricate hypothesis that I could think of, I think this is the right application of LLMs in bioml.
November 16, 2024 at 8:53 AM
There are lots of things we could do with this
- Are pathogenic variants less expected by *insert LLM method* than non-pathonegic variants?
- Is perplexity lower for notably conserved regions ?
- Can this be used to find conserved regions in new genomes ?
November 16, 2024 at 8:49 AM