Justin Kay
justin-kay.bsky.social
Justin Kay
@justin-kay.bsky.social
PhD student at MIT. Machine learning, computer vision, ecology, climate. Previously: Co-founder, CTO Ai.Fish; Researcher at Caltech; UC Berkeley. justinkay.github.io
CODA is a Highlight at @iccv.bsky.social next week!
Poster Session 1 Tuesday 10/21 11:45am, and Demo Session 6 Thursday 10/23 at 2:30pm.

Paper: www.arxiv.org/abs/2507.23771
Demo: huggingface.co/spaces/justi...
Code: github.com/justinkay/coda
@csail.mit.edu News:
bit.ly/48s6yS2

8/8
Consensus-Driven Active Model Selection
The widespread availability of off-the-shelf machine learning models poses a challenge: which model, of the many available candidates, should be chosen for a given data analysis task? This question of...
www.arxiv.org
October 13, 2025 at 6:00 PM
Thanks to MIT CSAIL @csail.mit.edu for covering CODA and how it supports ML practitioners in environmental conservation: t.co/SNJnnGfKvn

Try it yourself! We built a fun demo for finding the best wildlife classification model with @hf.co and @gradio-hf.bsky.social: huggingface.co/spaces/justi...

7/
3 Questions: How AI is helping us monitor and support vulnerable ecosystems | MIT CSAIL
www.csail.mit.edu
October 13, 2025 at 6:00 PM
CODA is exceptionally label-efficient. On a benchmark suite of 26 different datasets, we show that CODA identifies the optimal or near-optimal model with fewer than 25 labeled examples over 50% of the time, and with fewer than 100 labeled examples over 80% of the time. 6/
October 13, 2025 at 6:00 PM
CODA constructs a probabilistic model of which model is best at any labeling budget. To do this, we estimate confusion matrices for each candidate that we can 1) integrate over at any time to estimate which is best, and 2) update with new labels via Bayesian inference. 5/
October 13, 2025 at 6:00 PM
To do this efficiently, CODA leverages the *wisdom of the crowd* (of AI models), using the ensemble predictions from candidate models as a prior over the true labels of your unlabeled data. This helps identify high-value data points where top-performing models disagree. 4/
October 13, 2025 at 6:00 PM
Typically, answering the model selection question requires collecting a large test dataset to determine which candidate model is the best for you. CODA instead makes the process *active* – interactive, iterative, and guided by the models themselves. 3/
October 13, 2025 at 6:00 PM
Using AI for data analysis has typically meant training your own model. Large public model zoos like @hf.co Models are changing this paradigm, but pose a new challenge: which model, of the millions available, should you use to analyze your data? 2/
October 13, 2025 at 6:00 PM
Reposted by Justin Kay
BirdCLEF25: Audio-based species identification focused on birds, amphibians, mammals, and insects in Colombia.
👉 www.kaggle.com/competitions...
@cvprconference.bsky.social @kaggle.com
#FGVC #CVPR #CVPR2025 #LifeCLEF
[1/4]
April 9, 2025 at 10:22 AM
Project webpage: aldi-daod.github.io
Paper: arxiv.org/abs/2403.12029
Codebase (consider giving us a star!): github.com/justinkay/aldi
Dataset: github.com/visipedia/ca...

Team: @timm.haucke.xyz, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, @sarameghanbeery.bsky.social, and Grant Van Horn
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
aldi-daod.github.io
April 8, 2025 at 4:26 PM
Contributions:
- SOTA results across architectures (Faster R-CNN, DETR, YOLO), backbones (ResNet, ConvNeXt, ViT), datasets
- Unified benchmarking & implementation framework, making it easy to develop and test new adaptation methods
- A new real-world adaptation dataset sourced from fisheries sonar
April 8, 2025 at 4:26 PM