Christopher Akiki
banner
cakiki.bsky.social
Christopher Akiki
@cakiki.bsky.social
research scientist at ScaDS.AI Leipzig in nlp, ir, and ml. @hf.co fellow. @lichess.org team member. @kaggle.com datasets expert.
Three different ways to represent colo(u)r. Work in progress, inspired by an old post by Kat Zhang / The Poet Engineer.
November 4, 2025 at 12:05 PM
I made this annotated scatter plot of 1 million FineWeb-Edu documents for @sashamtl.bsky.social's new TED talk.
October 31, 2025 at 2:52 PM
Reposted by Christopher Akiki
When the fish left the river:
October 28, 2025 at 12:01 AM
Also really love how organic the plot looks with "inferno" (left) and "viridis" (right).
October 27, 2025 at 10:42 AM
Thanks to @jamesabednar.bsky.social I realized I had used the wrong background color for the colormap I had chosen. This is another version of the plot (different embeddings) with the corrected background.
October 26, 2025 at 4:06 PM
Map of the internet: 1.3M nodes (BGP)
October 26, 2025 at 1:39 PM
Reposted by Christopher Akiki
We're cooking.. 👀
October 7, 2025 at 11:52 AM
526.9 million player deaths in 24.7 million levels of Super Mario Maker 2. Data by @tgr.bsky.social
September 28, 2025 at 3:54 PM
Really cool new embeddings exploration tool by @domoritz.de and colleagues from Apple. Can't wait to build with this. Also includes a streamlit component and a Jupyter widget.
July 11, 2025 at 2:17 PM
Woah! EA just open sourced "Command and Conquer: Red Alert" and a bunch of other CnC games! github.com/electronicar...
February 28, 2025 at 12:12 PM
Reposted by Christopher Akiki
Lichess is now on @kaggle.com!

Use our puzzles, openings, and engine evaluation datasets directly in your kaggle notebooks: https://www.kaggle.com/organizations/lichess ♟️
February 2, 2025 at 12:03 PM
The folks at Foursquare released a @hf.co dataset of 104.5 million places of interest and here's all of them plotted using datashader
December 8, 2024 at 1:34 PM
I recently used the @lichess.org puzzles dataset to experiment with chess position embeddings and visualize 4.5M starting positions. (hf.co/datasets/Lic...)
December 6, 2024 at 1:00 PM
Reposted by Christopher Akiki
The Lichess database of games, puzzles, and engine evaluations is now on @hf.co - https://huggingface.co/Lichess. Billions of chess data points to download, query, and stream and we're excited to see what you'll build with it! ♟️ 🤗
December 6, 2024 at 9:46 AM
Early experiment visualizing of Cohere For AI's newly-released Aya dataset. Multilingual corpora are always so fun to play with.
February 13, 2024 at 8:01 PM
Clifford-inspired strange attractor.
November 17, 2023 at 7:38 PM
10 million digits of Pi.

Kind of.
September 27, 2023 at 7:40 PM
835 languages.
3.5 million bible verses.
Work in progress.
September 26, 2023 at 4:49 PM
UMAP connectivity graphs—with edgehammer bundling—are always something to gaze at.
September 26, 2023 at 9:10 AM
Revisiting John Williamson's prime factors plot with a few differences in implementation. I am using UMAP and Datashader to visualize the first million integers. Not quite there yet.
September 25, 2023 at 10:29 AM
Multilingual text corpus or Petri dish?
June 6, 2023 at 2:20 PM
Code Dataset Visualization—11.66 million files from the Stack, a dataset sourced from permissively-licensed GitHub repositories spanning 86 programming languages (StarCoder languages subset).
June 5, 2023 at 5:12 PM