Benjamin W. Nelson, PhD
bwfnelson.bsky.social
Benjamin W. Nelson, PhD
@bwfnelson.bsky.social
Sr Clinical Research Scientist @ Verily | Harvard Medical School and BIDMC | Behavioral Medicine | Digital Health | Wearables
This study represents years of collaborative work by an incredible cross-functional team at Verily and our outstanding research partners at SRI International. Congrats to the team! n/3
April 14, 2025 at 8:21 PM
We rigorously evaluated the Verily Numetric Watch’s ability to estimate over 12 sleep metrics against gold-standard in-lab polysomnography in a demographically diverse cohort. n/2
April 14, 2025 at 8:21 PM
Huge thanks to my amazing co-authors: @prof-nick-allen.bsky.social, John Torous, MD MBI, Ari Winbush, Steven Siddals, Matthew Flathers! Grateful for the opportunity to lead this project as part of my Adjunct Faculty position at Harvard Medical School and BIDMC.
April 1, 2025 at 8:59 PM
GPT-4o surpassing human performance for calm/neutral and surprise recognition, while Gemini surpassed human performance for surprise recognition.

We also examined model performance across actor race and sex, finding no significant biases—an encouraging result for future clinical applications. 4/n
April 1, 2025 at 8:59 PM
All LLM models demonstrated substantial to almost perfect agreement with ground truth labels. Notably, GPT-4o and Gemini reached human performance levels for overall facial emotion recognition 3/n
April 1, 2025 at 8:59 PM
We evaluated the agreement + accuracy GPT-4o, Gemini 2.0 Experimental, and Claude 3.5 Sonnet, using the NimStim dataset, a benchmark of 672 facial expressions from 43 diverse human actors resulting in 2,016 model-based emotion estimates. 2/n
April 1, 2025 at 8:59 PM