David Selby
banner
davidselby.bsky.social
David Selby
@davidselby.bsky.social
Data science researcher working on applications of machine learning in health at DFKI, getting the most out of small data. Reproducible #Rstats evangelist and unofficial British cultural ambassador to Rhineland-Palatinate 🇩🇪

https://selbydavid.com
Announcing a new special guest edition in Digital Health! Everything on patient-reported outcomes in mHealth, including algorithmic treatment and measurement allocation, N-of-1 trials, preference learning & handling subjective measurements. 📲

Take a look!

journals.sagepub.com/topic/collec...
October 10, 2025 at 2:06 PM
New preprint: how many patients could we save with LLM priors? Exploring the effect of eliciting informative priors for Bayesian clinical trials. arxiv.org/abs/2509.04250
How many patients could we save with LLM priors?
Imagine a world where clinical trials need far fewer patients to achieve the same statistical power, thanks to the knowledge encoded in large language models (LLMs). We present a novel framework for h...
arxiv.org
September 5, 2025 at 7:47 AM
📊📉📈 Better data visualizations with AI: can LLMs provide constructive critiques on existing charts? We explore how generative AI can automate #MakeoverMonday -type exercises, suggesting improvements to existing charts.

📄 New preprint + benchmark dataset 💽

arxiv.org/abs/2508.05637
Automated Visualization Makeovers with LLMs
Making a good graphic that accurately and efficiently conveys the desired message to the audience is both an art and a science, typically not taught in the data science curriculum. Visualisation makeo...
arxiv.org
August 19, 2025 at 6:17 AM
🧬BioDisco, an open-source biomedical hypothesis generator, uses agentic LLMs, knowledge graphs and literature search, with an iterative self-evaluation loop to discover novel relations, significantly outperforming other architectures.

Preprint: arxiv.org/abs/2508.01285
BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation
Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often...
arxiv.org
August 5, 2025 at 9:05 AM
New: unofficial @quarto.org template for the upcoming @realaaai.bsky.social 2026 conference. Write your submission in Markdown with reproducible, inline computations!

github.com/Selbosh/aaai...
GitHub - Selbosh/aaai2026-quarto: Unofficial Quarto template for the AAAI-2026 Conference
Unofficial Quarto template for the AAAI-2026 Conference - Selbosh/aaai2026-quarto
github.com
July 29, 2025 at 2:54 PM
What is a "Visible Neural Network"? It's a new kind of deep learning model for multi-omics, where prior knowledge and interpretability are baked into the architecture.

📄 We reviewed dozens of models, datasets & applications, and call for better tools/benchmarks:

www.frontiersin.org/journals/art...
Frontiers | Visible neural networks for multi-omics integration: a critical review
BackgroundBiomarker discovery and drug response prediction are central to personalized medicine, driving demand for predictive models that also offer biologi...
www.frontiersin.org
July 21, 2025 at 12:36 PM
Reposted by David Selby
Health Research From Home Hackathon 2025 |
This hackathon is being held by Health Research From Home Partnership led by the @OfficialUoM. Register your interest now: health-research-from-home.github.io/DataAnalysis...
Health Research From Home Hackathon 2025
7-9 May 2025
health-research-from-home.github.io
March 6, 2025 at 9:47 PM
Just published: 'Had enough of experts? Quantitative retrieval from large language models'

Can LLMs, having read the scientific literature, offer us useful numerical info to help fill in missing data and fit statistical models, like a real human expert? We investigate:

doi.org/10.1002/sta4...
Had Enough of Experts? Quantitative Knowledge Retrieval From Large Language Models
Large language models (LLMs) have been extensively studied for their ability to generate convincing natural language sequences; however, their utility for quantitative information retrieval is less w...
doi.org
March 17, 2025 at 8:50 AM
New blog post: on all the English I have had to learn since moving to Germany 🇬🇧 🇩🇪

selbydavid.com/2025/03/13/d...
Learning to Denglisch
At the railway station, a lost-looking US soldier asked me if I spoke English. Do I? At times it feels like it, but the Germans keep me guessing. Since moving to Germany, I have been continually teste...
selbydavid.com
March 13, 2025 at 5:04 PM
New blog post: Alternatives to @overleaf.com for #rstats, reproducible writing and collaboration

selbydavid.com/2025/03/04/o...
selbydavid.com
March 6, 2025 at 6:55 PM
Reposted by David Selby
Thrilled to share our latest publication in @natrevgenet.bsky.social. We explore how deep learning models infused with prior knowledge—biologically-informed neural networks or BINNs—offer better predictive accuracy and interpretability in multi-omics data analysis. www.nature.com/articles/s41...
Beyond the black box with biologically informed neural networks - Nature Reviews Genetics
Biologically informed neural networks promise to lead to more explainable, data-driven discoveries in genomics, drug development and precision medicine. Selby et al. highlight emerging opportunities, ...
www.nature.com
March 4, 2025 at 3:12 PM
Paper just accepted in Stat!

Can LLMs replace experts as sources of numerical information, such as Bayesian prior distributions for statistical models, or filling in missing values in tabular datasets for ML tasks?

We evaluate on applications across different fields.

arxiv.org/abs/2402.07770
Had enough of experts? Quantitative knowledge retrieval from large language models
Large language models (LLMs) have been extensively studied for their abilities to generate convincing natural language sequences, however their utility for quantitative information retrieval is less w...
arxiv.org
February 20, 2025 at 7:29 AM
How might one redesign this data visualization to avoid using much-maligned 'plunger plots'?

#visualisation

From www.nature.com/articles/s41...
January 10, 2025 at 6:42 AM
Pleased to present our poster at #NeurIPS2024 workshop on Bayesian Decisionmaking and Uncertainty! 🎉 Our work explores using large language models for eliciting expert-informed Bayesian priors. Elicited lots of discussion with the ML community too! Check it out: neurips.cc/virtual/2024...
December 20, 2024 at 12:43 PM
Excited to share our new preprint: Visible neural networks for multi-omics integration: a critical review! 🌟 We systematically analyse 86 studies on biologically informed neural networks (BINNs/VNNs), highlighting trends, challenges, interesting ideas & opportunities. www.biorxiv.org/content/10.1...
Visible neural networks for multi-omics integration: a critical review
Biomarker discovery and drug response prediction is central to personalized medicine, driving demand for predictive models that also offer biological insights. Biologically informed neural networks (B...
www.biorxiv.org
December 20, 2024 at 12:20 PM