Leland McInnes
lelandmcinnes.bsky.social
Leland McInnes
@lelandmcinnes.bsky.social
A Mathematician dabbling in Data Science, especially unsupervised learning and data exploration. UMAP, HDBSCAN, PyNNDescent, DataMapPlot. (He/Him)
Reposted by Leland McInnes
Our new pre-print shows how unsupervised clustering methods can identify biologically meaningful differences in early vocal production, with no human feedback. @antorrisi.bsky.social
has led this interdisciplinary collaboration based on computational methods + #chicks 🐣 arxiv.org/abs/2601.12203
January 24, 2026 at 1:15 PM
Reposted by Leland McInnes
here's a fun side project i've been working on: i compiled a joint text<>audio embedding model to a fast coreml pipeline, and built a very fast (~400ms for 50k samples, can scale to millions) UMAP dimensionality reduction GPU impl in mlx. using it to browse music libraries and do sample sim search
January 26, 2026 at 6:26 AM
Reposted by Leland McInnes
Xiaobin Li, Run Zhang: Understanding and Improving UMAP with Geometric and Topological Priors: The JORC-UMAP Algorithm https://arxiv.org/abs/2601.16552 https://arxiv.org/pdf/2601.16552 https://arxiv.org/html/2601.16552
January 27, 2026 at 6:33 AM
Reposted by Leland McInnes
I miss the days where you'd see blogposts with clever analyses on datasets, maths and data science tricks.

That's why, as an experiment, we're starting a new moderated subreddit. People can share/promote their notebooks and you can use RSS to subscribe.

Please join and share!
January 25, 2026 at 11:00 PM
Reposted by Leland McInnes
UMAP connectivity plots of 3,627 chess openings from the @lichess.org datasets (huggingface.co/datasets/Lic...)
January 20, 2026 at 12:21 PM
Reposted by Leland McInnes
I think it's important to note though that in spite of those incentives, the direction of the last two years has been more fungibility, *not* lock-in. And open source is the wrong fight here: when lock-in comes it will look more like the lock-in that Amazon or Uber have than Microsoft Office…
January 5, 2026 at 1:37 PM
Reposted by Leland McInnes
New preprint! Have you ever wondered, what are these fuzzy simplicial sets, the theoretical framework behind e.g. UMAP? Here we show that you may simply see them as marginal distributions over simplicial sets. This provides a generative model for UMAP. (1/2)

arxiv.org/abs/2512.03899
Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction
Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from alg...
arxiv.org
December 4, 2025 at 12:31 PM
Reposted by Leland McInnes
Space DJ turns genre embeddings into a playable galaxy—pilot a ship, the music follows. 🚀

Key stats
768→128 PCA compression; 3D UMAP projection; three.js rendering; autopilot drift; high‑dim neighbors surfacing hidden similarities.
November 11, 2025 at 3:03 PM
Reposted by Leland McInnes
via the magic of laion_clap embeddings and umap, my live coding thingy has a sample browser at last!
October 31, 2025 at 6:27 PM
Reposted by Leland McInnes
I made this annotated scatter plot of 1 million FineWeb-Edu documents for @sashamtl.bsky.social's new TED talk.
October 31, 2025 at 2:52 PM
Reposted by Leland McInnes
Also really love how organic the plot looks with "inferno" (left) and "viridis" (right).
October 27, 2025 at 10:42 AM
Reposted by Leland McInnes
Map of the internet: 1.3M nodes (BGP)
October 26, 2025 at 1:39 PM
The video of my talk at SciPy on DataMapPlot is up at last. If you make t-SNE or UMAP plots the talk provides some guidance on how to make plots most effective, and introduces a library to help make that easier.

www.youtube.com/watch?v=-iBh...
Leland McInnes - DataMapPlot: Rich Tools for UMAP | SciPy 2025
YouTube video by SciPy
www.youtube.com
October 17, 2025 at 1:56 PM
Reposted by Leland McInnes
Despite the gutting of the National Center for Educational Statistics, the dept of Ed *did* manage to release 2024 college major counts in the usual format, so I can run it through the same code I do every year. First off, the change since peak of the largest fields -- another year of drops.
September 28, 2025 at 2:20 AM
Reposted by Leland McInnes
I'm very much a learner, but you're maybe asking if aspects of matrix factorisation approaches to dimensionality reduction apply here. But LocalMAP is a KNN approach, with a matrix factorisation initialisation. h/t @lelandmcinnes.bsky.social for his attempts to describe these youtu.be/9iol3Lk6kyU
A Bluffer's Guide to Dimension Reduction - Leland McInnes
YouTube video by PyData
youtu.be
September 26, 2025 at 2:42 PM
Reposted by Leland McInnes
📢 Save the date!
Join us for the next @ellis.eu x UniReps Speaker Series!
📅 27th August – 16:00 CEST
📍https://ethz.zoom.us/j/66426188160
🎙️ Speakers: Keynote by @lelandmcinnes.bsky.social & Flash Talk by Yu (Demi) Qin
🔔 Stay updated by joining our Google group: groups.google.com/u/2/g/ellis-...
August 14, 2025 at 7:58 AM
Reposted by Leland McInnes
🚀 We've just open-sourced Embedding Atlas – a tool for exploring large embedding spaces through rich, interactive visualizations 📊.
August 1, 2025 at 8:24 AM
Reposted by Leland McInnes
Meteoroid stream identification with HDBSCAN unsupervised clustering algorithm. Eloy Peña-Asensio et. al. https://arxiv.org/abs/2507.01501
July 3, 2025 at 7:46 AM
Reposted by Leland McInnes
Ever wanted to pan through the latent🌌 space of TikTok videos? Made using the amazing toponymy and datamapplot from @lelandmcinnes.bsky.social
and data from mine and @jurgenpfeffer.bsky.social
's first complete TikTok slice. link below
July 11, 2025 at 4:45 PM
Reposted by Leland McInnes
🎤 Speaker Spotlight: Leland McInnes
Join Leland at #SciPy2025 for his talk "DataMapPlot: Rich Tools for UMAP Visualizations." 📊

Discover powerful new ways to explore high-dimensional data!
🔗 scipy2025.scipy.org
July 5, 2025 at 7:46 PM
Reposted by Leland McInnes
Explore Wikipedia through a data map. Pages are grouped by semantic similarity, for topic clusters.
Hover to see details, zoom to explore more fine-grained topics, click to go to a page. Search by page
name to find interesting starting points for exploration.

lmcinnes.github.io/datamapplot_...
June 22, 2025 at 3:36 PM
I'll be giving a talk about DataMapPlot for visualizing data maps at Scipy this year. I would love to meet potential users and chat about where to go next.

cfp.scipy.org/scipy2025/ta...
June 23, 2025 at 11:41 PM
Reposted by Leland McInnes
I also updated the ArXiv data map example to make use of new features in datamapplot.
lmcinnes.github.io/datamapplot_...

You can tweak parameters and build your own version:
gist.github.com/lmcinnes/e11...
June 22, 2025 at 9:59 PM
Reposted by Leland McInnes
OMG I am so glad someone finally did this.

Thank you 🙏 @lelandmcinnes.bsky.social

This will now consume hours and hours of my time.

lmcinnes.github.io/datamapplot_...
June 23, 2025 at 12:12 PM