✅ Input length-matched batching to mitigate CTC’s scaling issues
✅ Smoothing the group weight update to prevent overemphasis on consistently high-loss groups
✅ Input length-matched batching to mitigate CTC’s scaling issues
✅ Smoothing the group weight update to prevent overemphasis on consistently high-loss groups
Result? Worse performance.
We need a new approach 🚀
Result? Worse performance.
We need a new approach 🚀
Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.
Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.
Here’s how it works 🧵
Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.
Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.
Here’s how it works 🧵