Jason Lee

@jasondeanlee.bsky.social

1.5K followers 2.1K following 12 posts

Associate Professor at Princeton
Machine Learning Researcher

Videos

Economics 44%

Business 37%

Posts Replies Media Videos

Jason Lee @jasondeanlee.bsky.social · May 5

Our new work on scaling laws that includes compute, model size, and number of samples. The analysis involves an extremely fine-grained analysis of online sgd built up over the last 8 years of understanding sgd on simple toy models (tensors, single index models, multi index model)

Eshaan Nichani @eshaannichani.bsky.social · May 5

Excited to announce a new paper with Yunwei Ren, Denny Wu,
@jasondeanlee.bsky.social!

We prove a neural scaling law in the SGD learning of extensive width two-layer neural networks.

arxiv.org/abs/2504.19983

🧵below (1/10)

Reposted by Jason Lee

Eshaan Nichani @eshaannichani.bsky.social · May 5

Reposted by Robert West, Helen Tager‐Flusberg, Catherine Kyobutungi , and 22 more Robert West, Helen Tager‐Flusberg, Catherine Kyobutungi, James P. Collins, Ankur R. Desai, Christoph Scherber, Julian L. Simon, Sarah Mitchell, Kai Sassenberg, Dirk Messner, Thomas Hanitzsch, Melissa L. Finucane, Rachel M. Gisselquist, Sandra González‐Bailón, Sandrine Sorlin, Serge Jaumain, Vanessa Manceron, Emanuela Galasso, David Lefebvre, Jason Lee, Dominique Fauteux, Lisa Hülsmann, Patrick Saccone, Antoine Cabon, Erin K. Buchholtz

Stand Up for Science! @standupforscience.bsky.social · Feb 12

Welcome to the Bluesky account for Stand Up for Science 2025!

Keep an eye on this space for updates, event information, and ways to get involved. We can't wait to see everyone #standupforscience2025 on March 7th, both in DC and locations nationwide!

#scienceforall #sciencenotsilence

Jason Lee @jasondeanlee.bsky.social · Dec 11

Duck in Vancouver! Mott32

Reposted by Jason Lee

Peyman Milanfar @docmilanfar.bsky.social · Dec 3

“On a log-log plot, my grandmother fits on a straight line.”
-Physicist Fritz Houtermans

There's a lot of truth to this. log-log plots are often abused and can be very misleading

1/5

Jason Lee @jasondeanlee.bsky.social · Nov 27

Lool

Jason Lee @jasondeanlee.bsky.social · Nov 27

Representative results:
Settling the sampling complexity of RL: arxiv.org/abs/2307.13586
Optimal Muti-Distribution Learning (solved a COLT 2023 open problem): arxiv.org/abs/2312.05134
Anytime Acceleration of Gradient Descent (solved a COLT 2024 open problem): arxiv.org/abs/2411.17668

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these...

arxiv.org

Jason Lee @jasondeanlee.bsky.social · Nov 27

Zihan Zhang (tinyurl.com/4nks7f9b) is a postdoc with Yuxin Chen, Simon Du, and me.

Jason Lee @jasondeanlee.bsky.social · Nov 27

What's known about the 1.27 lower bound? It's a guess or there is a reason ppl believe it's fundamental?

Jason Lee @jasondeanlee.bsky.social · Nov 27

Send your colt open problems to Zihan, with high probability he will solve it!

Jason Lee @jasondeanlee.bsky.social · Nov 27

arxiv.org/abs/2411.17668 Our postdoc zihan slays another COLT open problem! proceedings.mlr.press/v247/kornows...

Anytime Acceleration of Gradient Descent

This work investigates stepsize-based acceleration of gradient descent with {\em anytime} convergence guarantees. For smooth (non-strongly) convex optimization, we propose a stepsize schedule that all...

arxiv.org

Jason Lee @jasondeanlee.bsky.social · Nov 27

arxiv.org/abs/2411.17668 Our postdoc zihan slays another COLT open problem! proceedings.mlr.press/v247/kornows...

Anytime Acceleration of Gradient Descent

arxiv.org

Jason Lee @jasondeanlee.bsky.social · Nov 25

What's the point of @perplexity_ai given chatgpt also does search?

Jason Lee @jasondeanlee.bsky.social · Nov 24

Yo add me to your starter packs!

Reposted by Jason Lee

Andrea Montanari @andrea-montanari.bsky.social · Nov 23

Assume that the nodes of a social network can choose between two alternative technologies: B and X.
A node using B receives a benefit with respect to X, but there is a benefit to using the same tech as the majority of your neighbors.
Assume everyone uses X at time t=0. Will they switch to B?