Harit Vishwakarma
harit7.bsky.social
Harit Vishwakarma
@harit7.bsky.social
Ph.D. Candidate at UW-Madison

https://harit7.github.io/
December 11, 2024 at 6:04 PM
Join us in the evening poster session (#1906) to learn more about it and chat about auto-labeling and data-centric AI.

Thanks to the amazing co-authors: Yi (Reid) Chen, Sui Jiet Tay, Srinath Namburi, @fredsala.bsky.social, Ramya Korlakai Vinayak.
December 11, 2024 at 5:53 PM
Our method learns confidence functions tailored for efficient and reliable auto-labeling. Using these in TBAL boosts the no. of auto-labeled points by up to 60% (while making < 5% auto-labeling errors) compared to baselines like softmax and several training-time and post-hoc calibration techniques.
December 11, 2024 at 5:53 PM
Introducing Colander, our framework for learning optimal confidence functions for TBAL! We formulate the auto-labeling objective as an optimization problem over the space of confidence functions and thresholds.
December 11, 2024 at 5:53 PM
We systematically study the limitations of popular confidence functions like softmax outputs and off-the-shelf calibration techniques. The result? Too few auto-labeled points or large auto-labeling errors.
December 11, 2024 at 5:53 PM
The choice confidence function is crucial in TBAL – if it's not aligned with the auto-labeling objective, it can be detrimental to performance. We show commonly used confidence functions fall short.
December 11, 2024 at 5:53 PM
TBAL is a promising auto-labeling technique. It iteratively acquires human labels for small data chunks, trains a model, and auto-labels points where the model's confidence is above a threshold. The goal? Maximize coverage (proportion of auto-labeled points) with bounded auto-labeling error.
December 11, 2024 at 5:53 PM