Sujay Nagaraj
@snagaraj.bsky.social
MD/PhD student | University of Toronto | Machine Learning for Health
We’ll be at #ICLR2025, Poster Session 1 – #516!
Come chat if you’re interested in learning more!
This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social.
Come chat if you’re interested in learning more!
This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social.
April 19, 2025 at 11:04 PM
We’ll be at #ICLR2025, Poster Session 1 – #516!
Come chat if you’re interested in learning more!
This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social.
Come chat if you’re interested in learning more!
This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social.
Our algorithm can improve safety and performance by flagging regretful predictions for abstention or data cleaning.
For example, we demonstrate that, by abstaining from prediction using our algorithm, we can reduce mistakes compared to standard approaches:
For example, we demonstrate that, by abstaining from prediction using our algorithm, we can reduce mistakes compared to standard approaches:
April 19, 2025 at 11:04 PM
Our algorithm can improve safety and performance by flagging regretful predictions for abstention or data cleaning.
For example, we demonstrate that, by abstaining from prediction using our algorithm, we can reduce mistakes compared to standard approaches:
For example, we demonstrate that, by abstaining from prediction using our algorithm, we can reduce mistakes compared to standard approaches:
We develop a method that trains models over plausible clean datasets to anticipate regretful predictions, helping us spot when a model is unreliable at the individual-level.
April 19, 2025 at 11:04 PM
We develop a method that trains models over plausible clean datasets to anticipate regretful predictions, helping us spot when a model is unreliable at the individual-level.
We capture this effect with a simple measure: regret.
Regret is inevitable with label noise, but it can tell us where models silently fail, and how we can guide safer predictions
Regret is inevitable with label noise, but it can tell us where models silently fail, and how we can guide safer predictions
April 19, 2025 at 11:04 PM
We capture this effect with a simple measure: regret.
Regret is inevitable with label noise, but it can tell us where models silently fail, and how we can guide safer predictions
Regret is inevitable with label noise, but it can tell us where models silently fail, and how we can guide safer predictions
This lottery breaks modern ML:
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
April 19, 2025 at 11:04 PM
This lottery breaks modern ML:
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
We can frame this problem as learning from noisy labels.
Plenty of algorithms have been designed to handle label noise by predicting well on average, but we show how they still fail on specific individuals.
Plenty of algorithms have been designed to handle label noise by predicting well on average, but we show how they still fail on specific individuals.
April 19, 2025 at 11:04 PM
We can frame this problem as learning from noisy labels.
Plenty of algorithms have been designed to handle label noise by predicting well on average, but we show how they still fail on specific individuals.
Plenty of algorithms have been designed to handle label noise by predicting well on average, but we show how they still fail on specific individuals.
We’ll be at #ICLR2025, Poster Session 1 – #516!
Come chat if you’re interested in learning more! This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social
Come chat if you’re interested in learning more! This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social
April 19, 2025 at 10:09 PM
We’ll be at #ICLR2025, Poster Session 1 – #516!
Come chat if you’re interested in learning more! This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social
Come chat if you’re interested in learning more! This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social
Our algorithm can improve safety and performance by flagging regretful predictions for abstention or for data cleaning. For example, we demonstrate how abstaining from prediction on these instances can reduce mistakes compared to standard approaches:
April 19, 2025 at 10:09 PM
Our algorithm can improve safety and performance by flagging regretful predictions for abstention or for data cleaning. For example, we demonstrate how abstaining from prediction on these instances can reduce mistakes compared to standard approaches:
We develop a method to anticipate regretful predictions by training models over plausible clean datasets.
This helps us spot when a model is unreliable at the individual-level.
This helps us spot when a model is unreliable at the individual-level.
April 19, 2025 at 10:09 PM
We develop a method to anticipate regretful predictions by training models over plausible clean datasets.
This helps us spot when a model is unreliable at the individual-level.
This helps us spot when a model is unreliable at the individual-level.
We capture this effect with a simple measure: regret.
Regret is inevitable with label noise -- it tells us where models silently fail, and how we can guide safer predictions.
Regret is inevitable with label noise -- it tells us where models silently fail, and how we can guide safer predictions.
April 19, 2025 at 10:09 PM
We capture this effect with a simple measure: regret.
Regret is inevitable with label noise -- it tells us where models silently fail, and how we can guide safer predictions.
Regret is inevitable with label noise -- it tells us where models silently fail, and how we can guide safer predictions.
This lottery breaks modern ML:
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
April 19, 2025 at 10:09 PM
This lottery breaks modern ML:
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
If we can’t tell which predictions are wrong, we can’t improve models, we can’t debug, and we can’t trust them in high-stakes tasks like healthcare.
We can frame this as learning from noisy labels.
Plenty of algorithms have been designed to handle label noise by predicting well on average —
But we show how they can still fail on specific individuals.
Plenty of algorithms have been designed to handle label noise by predicting well on average —
But we show how they can still fail on specific individuals.
April 19, 2025 at 10:09 PM
We can frame this as learning from noisy labels.
Plenty of algorithms have been designed to handle label noise by predicting well on average —
But we show how they can still fail on specific individuals.
Plenty of algorithms have been designed to handle label noise by predicting well on average —
But we show how they can still fail on specific individuals.
🧠 Key takeaway: Label noise isn’t static—especially in time series.
💬 Come chat with me at #ICLR2025 Poster Session 2!
Shoutout to my amazing colleagues behind this work:
@tomhartvigsen.bsky.social
@berkustun.bsky.social
💬 Come chat with me at #ICLR2025 Poster Session 2!
Shoutout to my amazing colleagues behind this work:
@tomhartvigsen.bsky.social
@berkustun.bsky.social
April 13, 2025 at 5:40 PM
🧠 Key takeaway: Label noise isn’t static—especially in time series.
💬 Come chat with me at #ICLR2025 Poster Session 2!
Shoutout to my amazing colleagues behind this work:
@tomhartvigsen.bsky.social
@berkustun.bsky.social
💬 Come chat with me at #ICLR2025 Poster Session 2!
Shoutout to my amazing colleagues behind this work:
@tomhartvigsen.bsky.social
@berkustun.bsky.social
🔬 Real-world demo:
We applied our method to stress detection from smartwatches where we have noisy self-reported labels vs. clean physiological measures.
📈 Our model tracks the true time-varying label noise—reducing test error over baselines.
We applied our method to stress detection from smartwatches where we have noisy self-reported labels vs. clean physiological measures.
📈 Our model tracks the true time-varying label noise—reducing test error over baselines.
April 13, 2025 at 5:40 PM
🔬 Real-world demo:
We applied our method to stress detection from smartwatches where we have noisy self-reported labels vs. clean physiological measures.
📈 Our model tracks the true time-varying label noise—reducing test error over baselines.
We applied our method to stress detection from smartwatches where we have noisy self-reported labels vs. clean physiological measures.
📈 Our model tracks the true time-varying label noise—reducing test error over baselines.
We propose methods to learn this function directly from noisy data.
💥 Results:
On 4 real-world time series tasks:
✅ Temporal methods beat static baselines
✅ Our methods better approximate the true noise function
✅ They work when the noise function is unknown!
💥 Results:
On 4 real-world time series tasks:
✅ Temporal methods beat static baselines
✅ Our methods better approximate the true noise function
✅ They work when the noise function is unknown!
April 13, 2025 at 5:40 PM
We propose methods to learn this function directly from noisy data.
💥 Results:
On 4 real-world time series tasks:
✅ Temporal methods beat static baselines
✅ Our methods better approximate the true noise function
✅ They work when the noise function is unknown!
💥 Results:
On 4 real-world time series tasks:
✅ Temporal methods beat static baselines
✅ Our methods better approximate the true noise function
✅ They work when the noise function is unknown!
📌 We formalize this setting:
A temporal label noise function defines how likely each true label is to be flipped—as a function of time.
Using this function, we propose a new time series loss function that is provably robust to label noise.
A temporal label noise function defines how likely each true label is to be flipped—as a function of time.
Using this function, we propose a new time series loss function that is provably robust to label noise.
April 13, 2025 at 5:40 PM
📌 We formalize this setting:
A temporal label noise function defines how likely each true label is to be flipped—as a function of time.
Using this function, we propose a new time series loss function that is provably robust to label noise.
A temporal label noise function defines how likely each true label is to be flipped—as a function of time.
Using this function, we propose a new time series loss function that is provably robust to label noise.
🕒 What is temporal label noise?
In many real-world time series (e.g., wearables, EHRs), label quality fluctuates over time
➡️ Participants fatigue
➡️ Clinicians miss more during busy shifts
➡️ Self-reports drift seasonally
Existing methods assume static noise → they fail here
In many real-world time series (e.g., wearables, EHRs), label quality fluctuates over time
➡️ Participants fatigue
➡️ Clinicians miss more during busy shifts
➡️ Self-reports drift seasonally
Existing methods assume static noise → they fail here
April 13, 2025 at 5:40 PM
🕒 What is temporal label noise?
In many real-world time series (e.g., wearables, EHRs), label quality fluctuates over time
➡️ Participants fatigue
➡️ Clinicians miss more during busy shifts
➡️ Self-reports drift seasonally
Existing methods assume static noise → they fail here
In many real-world time series (e.g., wearables, EHRs), label quality fluctuates over time
➡️ Participants fatigue
➡️ Clinicians miss more during busy shifts
➡️ Self-reports drift seasonally
Existing methods assume static noise → they fail here
Would be great to be added :)
December 22, 2024 at 2:57 AM
Would be great to be added :)