Lightnews — Scholar-powered news

Harit Vishwakarma

@harit7.bsky.social

19 followers 15 following 8 posts

Ph.D. Candidate at UW-Madison

https://harit7.github.io/

Posts Replies Media Videos

Harit Vishwakarma

@harit7.bsky.social

@srinathnamburi.bsky.social

December 11, 2024 at 6:04 PM

Harit Vishwakarma

@harit7.bsky.social

Join us in the evening poster session (#1906) to learn more about it and chat about auto-labeling and data-centric AI.

Thanks to the amazing co-authors: Yi (Reid) Chen, Sui Jiet Tay, Srinath Namburi, @fredsala.bsky.social, Ramya Korlakai Vinayak.

December 11, 2024 at 5:53 PM

Harit Vishwakarma

@harit7.bsky.social

Our method learns confidence functions tailored for efficient and reliable auto-labeling. Using these in TBAL boosts the no. of auto-labeled points by up to 60% (while making < 5% auto-labeling errors) compared to baselines like softmax and several training-time and post-hoc calibration techniques.

December 11, 2024 at 5:53 PM

Harit Vishwakarma

@harit7.bsky.social

Introducing Colander, our framework for learning optimal confidence functions for TBAL! We formulate the auto-labeling objective as an optimization problem over the space of confidence functions and thresholds.

December 11, 2024 at 5:53 PM

Harit Vishwakarma

@harit7.bsky.social

We systematically study the limitations of popular confidence functions like softmax outputs and off-the-shelf calibration techniques. The result? Too few auto-labeled points or large auto-labeling errors.

December 11, 2024 at 5:53 PM

Harit Vishwakarma

@harit7.bsky.social

The choice confidence function is crucial in TBAL – if it's not aligned with the auto-labeling objective, it can be detrimental to performance. We show commonly used confidence functions fall short.

December 11, 2024 at 5:53 PM

Harit Vishwakarma

@harit7.bsky.social

TBAL is a promising auto-labeling technique. It iteratively acquires human labels for small data chunks, trains a model, and auto-labels points where the model's confidence is above a threshold. The goal? Maximize coverage (proportion of auto-labeled points) with bounded auto-labeling error.

December 11, 2024 at 5:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news