Josh Susskind
@kindsuss.bsky.social
Ramen whisperer, bad throat singer
Cogsci peeps! This is a great opportunity! @sineadwilliamson.bsky.social is a great mentor and scientist, along with a wonderful team :)
📢 We’re looking for a researcher in in cogsci, neuroscience, linguistics, or related disciplines to work with us at Apple Machine Learning Research! We're hiring for a one-year interdisciplinary AIML Resident to work on understanding reasoning and decision making in LLMs. 🧵
November 7, 2025 at 10:15 PM
Cogsci peeps! This is a great opportunity! @sineadwilliamson.bsky.social is a great mentor and scientist, along with a wonderful team :)
Reposted by Josh Susskind
We have been working with Michal Klein on pushing a module to train *flow matching* models using JAX. This is shipped as part of our new release of the OTT-JAX toolbox (github.com/ott-jax/ott)
The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...
The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...
November 5, 2025 at 2:04 PM
We have been working with Michal Klein on pushing a module to train *flow matching* models using JAX. This is shipped as part of our new release of the OTT-JAX toolbox (github.com/ott-jax/ott)
The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...
The tutorial to do so is here: ott-jax.readthedocs.io/tutorials/ne...
Reposted by Josh Susskind
scaling up the computation of optimal transport couplings to hundreds of thousands of 3k dimensional vectors made easy using sharding and OTT-JAX! check this notebook, it only takes a few lines of code thanks to JAX's native sharding abilities ott-jax.readthedocs.io/en/latest/tu...
Sharded Sinkhorn — ott 0.5.1.dev34+g3462f28 documentation
ott-jax.readthedocs.io
August 1, 2025 at 12:13 AM
scaling up the computation of optimal transport couplings to hundreds of thousands of 3k dimensional vectors made easy using sharding and OTT-JAX! check this notebook, it only takes a few lines of code thanks to JAX's native sharding abilities ott-jax.readthedocs.io/en/latest/tu...
Wow. Thank you for your bravery whoever you are.
At the ICE raid and subsequent community resistance in Paramount, California this morning, this skater kid ate dozens of munitions from Border Patrol agents, walked away slowly and flipped them off.
June 8, 2025 at 6:07 AM
Wow. Thank you for your bravery whoever you are.
A Call for Constructive Engagement | AAC&U
A Call for Constructive Engagement
www.aacu.org
April 23, 2025 at 4:41 AM
Check out our Apple research work on scaling laws for native multimodal models! Combined with mixtures of experts, native models develop both specialized and multimodal representations! Lots of rich findings and opportunists for follow up research!
Shukor, Fini, da Costa, Cord, Susskind, El-Nouby: Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models https://arxiv.org/abs/2504.07951 https://arxiv.org/pdf/2504.07951 https://arxiv.org/html/2504.07951
April 11, 2025 at 10:37 PM
Check out our Apple research work on scaling laws for native multimodal models! Combined with mixtures of experts, native models develop both specialized and multimodal representations! Lots of rich findings and opportunists for follow up research!
My colleagues in #Apple ML Research posted a fun paper investigating how autoregressive design choices affect reasoning (in this case, multi-choice question answering), showing a benefit to R2L ordering. Reminds me of similar findings for reverse order addition in arxiv.org/abs/2310.16028!
March 24, 2025 at 5:47 PM
My colleagues in #Apple ML Research posted a fun paper investigating how autoregressive design choices affect reasoning (in this case, multi-choice question answering), showing a benefit to R2L ordering. Reminds me of similar findings for reverse order addition in arxiv.org/abs/2310.16028!
My colleague Shuangfei Zhai is looking for a summer research intern to work on improving TarFlow at Apple. If interested, send your CV to szhai at apple.com by this week.
February 25, 2025 at 1:36 AM
My colleague Shuangfei Zhai is looking for a summer research intern to work on improving TarFlow at Apple. If interested, send your CV to szhai at apple.com by this week.
A Sad Moment in American History
YouTube video by Senator Bernie Sanders
youtu.be
February 20, 2025 at 5:12 AM
Here's a great paper on scaling laws for teacher-student neural network distillation led by @dbusbridge.bsky.social and Apple colleagues. I've often seen people struggle to get distillation working well enough in practical settings, and I expect the insights in this paper can really help!
Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering:
"If I want a small, capable model, should I distill from a more powerful model, or train from scratch?"
Our distillation scaling law shows, well, it's complicated... 🧵
arxiv.org/abs/2502.08606
"If I want a small, capable model, should I distill from a more powerful model, or train from scratch?"
Our distillation scaling law shows, well, it's complicated... 🧵
arxiv.org/abs/2502.08606
Distillation Scaling Laws
We provide a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings reduce the risks associated ...
arxiv.org
February 14, 2025 at 3:30 AM
Here's a great paper on scaling laws for teacher-student neural network distillation led by @dbusbridge.bsky.social and Apple colleagues. I've often seen people struggle to get distillation working well enough in practical settings, and I expect the insights in this paper can really help!
Here's a fun Apple research paper seeking to understand when/why diffusion models can be composed to generate images containing multiple independent concepts. For example, composing images from a model trained on Preetum's dog and a model trained on hats. Because why wouldn't you want to do that?!!
Paper🧵 (cross-posted at X): When does composition of diffusion models "work"? Intuitively, the reason dog+hat works and dog+horse doesn’t has something to do with independence between the concepts being composed. The tricky part is to formalize exactly what this means. 1/
February 12, 2025 at 4:47 AM
Here's a fun Apple research paper seeking to understand when/why diffusion models can be composed to generate images containing multiple independent concepts. For example, composing images from a model trained on Preetum's dog and a model trained on hats. Because why wouldn't you want to do that?!!
If you are interested in doing an internship in ML research at Apple, I highly recommend talking with Etai Littwin (and Vimal Thilak is pretty awesome too!)
🚨 Apple Machine Learning Research Internship opportunity! My colleagues in Apple MLR are looking for a PhD research intern with a strong interest in reinforcement learning/post-training for LLMs. If interested, apply by sending an email to Etai Littwin (elittwin at apple dot com)
February 11, 2025 at 3:50 AM
If you are interested in doing an internship in ML research at Apple, I highly recommend talking with Etai Littwin (and Vimal Thilak is pretty awesome too!)
This work was born from an Apple internship with Harshay Shah. Samira provided excellent direction and technical contributions along with Vimal, and the entire team was incredibly helpful! I'm intrigued reading comprehension tasks do not follow pre-training scaling curves -- gotta follow this up!
🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?
We explored this through the lens of MoEs:
We explored this through the lens of MoEs:
January 29, 2025 at 5:17 AM
This work was born from an Apple internship with Harshay Shah. Samira provided excellent direction and technical contributions along with Vimal, and the entire team was incredibly helpful! I'm intrigued reading comprehension tasks do not follow pre-training scaling curves -- gotta follow this up!
Reposted by Josh Susskind
Missing the deep learning part? go check out the follow up work @neuripsconf.bsky.social (tinyurl.com/yvf72kzf) and @iclr-conf.bsky.social (tinyurl.com/4vh8vuzk)
January 23, 2025 at 8:45 AM
Missing the deep learning part? go check out the follow up work @neuripsconf.bsky.social (tinyurl.com/yvf72kzf) and @iclr-conf.bsky.social (tinyurl.com/4vh8vuzk)
Too disgusted by the Twitter/X vomit and could not justify keeping my account there. Hoping this platform steers clear of disinformation and hate -- and remains a positive place to share science and other good things.
January 23, 2025 at 10:00 PM
Too disgusted by the Twitter/X vomit and could not justify keeping my account there. Hoping this platform steers clear of disinformation and hate -- and remains a positive place to share science and other good things.
Here's a really cool cross-institution study leveraging optimal transport techniques developed by my Apple ML Research colleagues! It's great to see basic research in machine learning translate into scientific tools like this. Cuts into the AI hype a bit ;)
Today is a great day for optimal transport 🎉! Lots of gratitude 🙏 for all folks who contributed to ott-jax.readthedocs.io and pushed for the MOSCOT (now @ nature!) paper, from visionaries @dominik1klein.bsky.social, G. Palla, Z. Piran to the magician, Michal Klein! ❤️
www.nature.com/articles/s41...
www.nature.com/articles/s41...
January 23, 2025 at 9:54 PM
Here's a really cool cross-institution study leveraging optimal transport techniques developed by my Apple ML Research colleagues! It's great to see basic research in machine learning translate into scientific tools like this. Cuts into the AI hype a bit ;)
Reposted by Josh Susskind
Excited about vision-language models? 🚀 Check out our latest work on FastVLM, a new family of efficient vision-language models that balances the tradeoff between high-resolution image understanding and latency without compromising accuracy!
arxiv.org/abs/2412.13303
arxiv.org/abs/2412.13303
December 19, 2024 at 6:18 PM
Excited about vision-language models? 🚀 Check out our latest work on FastVLM, a new family of efficient vision-language models that balances the tradeoff between high-resolution image understanding and latency without compromising accuracy!
arxiv.org/abs/2412.13303
arxiv.org/abs/2412.13303
If you're looking for research scientist roles in Europe, check out Marco's post! The Paris team is fantastic, and does diverse idea-driven and impactful research. In addition, MLR is highly collaborative across timezones, so you'd have a chance to work with many others too.
The Apple Machine Learning Research (MLR) team in Paris has openings for both FTE roles and a short-term post-doc position to contribute to our team's research agenda. Researchers at Apple's MLR (led by Samy Bengio) target impactful publications in top-tier ML venues and OSS.
December 18, 2024 at 5:14 PM
If you're looking for research scientist roles in Europe, check out Marco's post! The Paris team is fantastic, and does diverse idea-driven and impactful research. In addition, MLR is highly collaborative across timezones, so you'd have a chance to work with many others too.
Cross-posting from that other place:
I’m really proud of Apple
ML research and wanted to share a summary that may be useful for #NeurIPS2024 attendees (and everyone else)! I’m particularly excited about code and model releases and will highlight some here.
1/n
I’m really proud of Apple
ML research and wanted to share a summary that may be useful for #NeurIPS2024 attendees (and everyone else)! I’m particularly excited about code and model releases and will highlight some here.
1/n
December 10, 2024 at 11:53 PM
Cross-posting from that other place:
I’m really proud of Apple
ML research and wanted to share a summary that may be useful for #NeurIPS2024 attendees (and everyone else)! I’m particularly excited about code and model releases and will highlight some here.
1/n
I’m really proud of Apple
ML research and wanted to share a summary that may be useful for #NeurIPS2024 attendees (and everyone else)! I’m particularly excited about code and model releases and will highlight some here.
1/n
Hello Bluesky! I'm a machine learning (aka AI these days) researcher at Apple, and my main interests these days are open science, flatpicking guitar, and rock climbing (and maybe the intersection of all of the above). I'll occasionally highlight some of the research going on in my group here.
December 10, 2024 at 11:45 PM
Hello Bluesky! I'm a machine learning (aka AI these days) researcher at Apple, and my main interests these days are open science, flatpicking guitar, and rock climbing (and maybe the intersection of all of the above). I'll occasionally highlight some of the research going on in my group here.