Desi R Ivanova
@desirivanova.bsky.social
Research fellow @OxfordStats @OxCSML, spent time at FAIR and MSR
Former quant 📈 (@GoldmanSachs), former former gymnast 🤸♀️
My opinions are my own
🇧🇬-🇬🇧 sh/ssh
Former quant 📈 (@GoldmanSachs), former former gymnast 🤸♀️
My opinions are my own
🇧🇬-🇬🇧 sh/ssh
Last DL lecture open.substack.com/pub/probappr...
Lecture 6: Training practicalities
Deep learning's black magic
open.substack.com
March 17, 2025 at 2:33 PM
Last DL lecture open.substack.com/pub/probappr...
Lecture 5: Backpropagation and Autodifferentiation
Thank god the days of computing gradients by hand are over! Nevertheless, it’s good to know what backprop is and why we do it
open.substack.com/pub/probappr...
Thank god the days of computing gradients by hand are over! Nevertheless, it’s good to know what backprop is and why we do it
open.substack.com/pub/probappr...
Lecture 5: Backprop and Autodiff
Order matters
open.substack.com
March 12, 2025 at 12:11 PM
Lecture 5: Backpropagation and Autodifferentiation
Thank god the days of computing gradients by hand are over! Nevertheless, it’s good to know what backprop is and why we do it
open.substack.com/pub/probappr...
Thank god the days of computing gradients by hand are over! Nevertheless, it’s good to know what backprop is and why we do it
open.substack.com/pub/probappr...
The fourth post in the series: open.substack.com/pub/probappr...
Lecture 4: Neural network architectures
Attention!
open.substack.com
March 9, 2025 at 6:46 PM
The fourth post in the series: open.substack.com/pub/probappr...
Go read it on arXiv! Thanks to my co-authors @sambowyer.bsky.social and @laurenceai.bsky.social 💥
March 6, 2025 at 3:00 PM
Go read it on arXiv! Thanks to my co-authors @sambowyer.bsky.social and @laurenceai.bsky.social 💥
Along with the lightweight library, we provide short code snippets in the paper.
March 6, 2025 at 3:00 PM
Along with the lightweight library, we provide short code snippets in the paper.
…and for constructing error bars on more complicated metrics, such as F1 score, that require the flexibility of Bayes.
March 6, 2025 at 3:00 PM
…and for constructing error bars on more complicated metrics, such as F1 score, that require the flexibility of Bayes.
...and treated without an independence assumption (e.g. using the same eval questions on both LLMs)...
March 6, 2025 at 3:00 PM
...and treated without an independence assumption (e.g. using the same eval questions on both LLMs)...
...for making comparisons between two LLMs treated independently...
March 6, 2025 at 3:00 PM
...for making comparisons between two LLMs treated independently...
We also suggest simple methods for the clustered-question setting (where we don't assume all questions are IID -- instead we have T groups of N/T IID questions)...
March 6, 2025 at 3:00 PM
We also suggest simple methods for the clustered-question setting (where we don't assume all questions are IID -- instead we have T groups of N/T IID questions)...
Or, in this IID question setting, if you want to stay frequentist you can use Wilson-score intervals: en.wikipedia.org/wiki/Binomial_…
https://en.wikipedia.org/wiki/Binomial_….
March 6, 2025 at 3:00 PM
Or, in this IID question setting, if you want to stay frequentist you can use Wilson-score intervals: en.wikipedia.org/wiki/Binomial_…
We suggest using Bayesian credible intervals for your error bars instead, with a simple Beta-Binomial model. (The aim is for the methods to achieve nominal 1-alpha coverage i.e. match the dotted line in the top row. A 95% confidence interval should be right 95% of the time.)
March 6, 2025 at 3:00 PM
We suggest using Bayesian credible intervals for your error bars instead, with a simple Beta-Binomial model. (The aim is for the methods to achieve nominal 1-alpha coverage i.e. match the dotted line in the top row. A 95% confidence interval should be right 95% of the time.)
This, along with the CLT's ignorance of typically binary eval data (correct/incorrect responses to an eval question) lead to poor error bars which collapse to zero-width or extend past [0,1].
March 6, 2025 at 3:00 PM
This, along with the CLT's ignorance of typically binary eval data (correct/incorrect responses to an eval question) lead to poor error bars which collapse to zero-width or extend past [0,1].
As LLMs get better, benchmarks to evaluate their capabilities are getting smaller (and harder). This starts to violate the CLT's large N assumption. Meanwhile, we have lots of eval settings in which questions aren't IID (e.g. questions in a benchmark often aren't independent).
March 6, 2025 at 3:00 PM
As LLMs get better, benchmarks to evaluate their capabilities are getting smaller (and harder). This starts to violate the CLT's large N assumption. Meanwhile, we have lots of eval settings in which questions aren't IID (e.g. questions in a benchmark often aren't independent).
The third in the teaching blogs series: Introduction to deep learning
open.substack.com/pub/probappr...
open.substack.com/pub/probappr...
Lecture 3: Introduction to Deep Learning
aka neural networks aka differentiable programming
open.substack.com
March 3, 2025 at 5:03 PM
The third in the teaching blogs series: Introduction to deep learning
open.substack.com/pub/probappr...
open.substack.com/pub/probappr...
NHS boss was sacked (well, “resigned”), so there’s some hope for major reforms and improvements in the health system (I hope 🤞)
February 26, 2025 at 11:36 AM
NHS boss was sacked (well, “resigned”), so there’s some hope for major reforms and improvements in the health system (I hope 🤞)
Nice. Are the materials publicly available?
February 21, 2025 at 3:54 PM
Nice. Are the materials publicly available?
We currently do 2 lectures on GPs 😅 one could certainly do a whole course (bayesopt, automl) - could be fun!
February 21, 2025 at 3:44 PM
We currently do 2 lectures on GPs 😅 one could certainly do a whole course (bayesopt, automl) - could be fun!
Indeed, the course is already really quite tight. So if DPs are to be covered, something has to be dropped. I’m thinking for next year potentially dropping constrained optimisation/SVMs (done in the first half) and covering BNP more thoroughly
February 21, 2025 at 3:42 PM
Indeed, the course is already really quite tight. So if DPs are to be covered, something has to be dropped. I’m thinking for next year potentially dropping constrained optimisation/SVMs (done in the first half) and covering BNP more thoroughly
It’s a mix - first part was ERM, SVMs and kernels; second part (which is the one I’m teaching) - Bayesian ML (GPs), deep learning and VI
February 21, 2025 at 2:06 PM
It’s a mix - first part was ERM, SVMs and kernels; second part (which is the one I’m teaching) - Bayesian ML (GPs), deep learning and VI
Teaching is super undervalued by universities (at least in UK) so there’s very little incentive to do it well. I think this is wrong and thoughtful pedagogy matters deeply. I hope these “teaching blogs” series will help me get up to speed and improve more quickly
open.substack.com/pub/probappr...
open.substack.com/pub/probappr...
Lecture 1: Gaussian Processes and GP Regression
Nice and easy when everything is Gaussian
open.substack.com
February 21, 2025 at 1:10 PM
Teaching is super undervalued by universities (at least in UK) so there’s very little incentive to do it well. I think this is wrong and thoughtful pedagogy matters deeply. I hope these “teaching blogs” series will help me get up to speed and improve more quickly
open.substack.com/pub/probappr...
open.substack.com/pub/probappr...