(Also, we are trying to build such framework already for a while at github.com/nf-core/deep... , currently pipeline is moving a lot because we are in the process of porting of our code to nf-core)
(Also, we are trying to build such framework already for a while at github.com/nf-core/deep... , currently pipeline is moving a lot because we are in the process of porting of our code to nf-core)
An important hint is that there are certain ways to evaluate predictions regardless of the methods used to generate them.
This is an excellent attempt (blog & paper) at bringing more statistical rigor to evaluation of ML models (this is specifically focused on LLM evals).
I feel like we need to have similar clear standards for many types of predictive models in biology. 1/
I agree that downstream test development is useful, and extra convenient. (no need to retrain, can think of model as black box, gives interesting bio insights etc.)
This is an excellent attempt (blog & paper) at bringing more statistical rigor to evaluation of ML models (this is specifically focused on LLM evals).
I feel like we need to have similar clear standards for many types of predictive models in biology. 1/
I agree that downstream test development is useful, and extra convenient. (no need to retrain, can think of model as black box, gives interesting bio insights etc.)
- Are pathogenic variants less expected by *insert LLM method* than non-pathonegic variants?
- Is perplexity lower for notably conserved regions ?
- Can this be used to find conserved regions in new genomes ?
- Are pathogenic variants less expected by *insert LLM method* than non-pathonegic variants?
- Is perplexity lower for notably conserved regions ?
- Can this be used to find conserved regions in new genomes ?