Leland McInnes
lelandmcinnes.bsky.social
Leland McInnes
@lelandmcinnes.bsky.social
A Mathematician dabbling in Data Science, especially unsupervised learning and data exploration. UMAP, HDBSCAN, PyNNDescent, DataMapPlot. (He/Him)
Since 2018 if I recall correctly.
November 4, 2025 at 3:00 AM
Reposted by Leland McInnes
I'm very much a learner, but you're maybe asking if aspects of matrix factorisation approaches to dimensionality reduction apply here. But LocalMAP is a KNN approach, with a matrix factorisation initialisation. h/t @lelandmcinnes.bsky.social for his attempts to describe these youtu.be/9iol3Lk6kyU
A Bluffer's Guide to Dimension Reduction - Leland McInnes
YouTube video by PyData
youtu.be
September 26, 2025 at 2:42 PM
It has been done before. For example: virtual.ieeevis.org/year/2022/pa...

Learning what makes it accessible and useable for a general audience has been the longer task. The topic region naming with Toponymy makes a huge difference, and that is still very much a work in progress.
Virtual IEEE VIS 2022 - Paper: Mapping Wikipedia with BERT and UMAP
VIS 2022 will be the year’s premier forum for advances in theory, methods, and applications of visualization and visual analytics. The conference will convene an international community of researchers...
virtual.ieeevis.org
June 22, 2025 at 8:22 PM
It should be possible to build one en Francais following this: gist.github.com/lmcinnes/951...

It pulls the data from "hf://datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3/en/*.parquet" but you can swap in "hf://datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3/fr/*.parquet".
Interactive Data Map of Wikipedia
Interactive Data Map of Wikipedia. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
June 22, 2025 at 6:24 PM
Vous pouvez créer le vôtre : gist.github.com/lmcinnes/951...

Remplacez « en » par « fr » et tout devrait fonctionner.

(Veuillez excuser mon erreur de traduction Google !)
Interactive Data Map of Wikipedia
Interactive Data Map of Wikipedia. GitHub Gist: instantly share code, notes, and snippets.
gist.github.com
June 22, 2025 at 6:19 PM
For even more wikipedia vectors Nomic.ai just released vectorization and a data map for all of Wikipedia in all languages!

enterprise.wikimedia.com/blog/nomic-a...

huggingface.co/datasets/nom...
Nomic AI
Nomic builds AI that understands complex files, documents and datasets.
Nomic.ai
June 22, 2025 at 3:36 PM
Special thanks to @jayalammar.bsky.social and Nils Reimers from @cohere.com for providing embedding vectors for all of Wikipedia.

cohere.com/blog/embeddi...
The Embedding Archives: Millions of Wikipedia Article Embeddings in Many Languages
Cohere's massive archive of embedding vectors from Wikipedia can be freely downloaded and used to power applications.
cohere.com
June 22, 2025 at 3:36 PM