Christopher Akiki
banner
cakiki.bsky.social
Christopher Akiki
@cakiki.bsky.social
research scientist at ScaDS.AI Leipzig in nlp, ir, and ml. @hf.co fellow. @lichess.org team member. @kaggle.com datasets expert.
Three different ways to represent colo(u)r. Work in progress, inspired by an old post by Kat Zhang / The Poet Engineer.
November 4, 2025 at 12:05 PM
I made this annotated scatter plot of 1 million FineWeb-Edu documents for @sashamtl.bsky.social's new TED talk.
October 31, 2025 at 2:52 PM
Also really love how organic the plot looks with "inferno" (left) and "viridis" (right).
October 27, 2025 at 10:42 AM
Thanks to @jamesabednar.bsky.social I realized I had used the wrong background color for the colormap I had chosen. This is another version of the plot (different embeddings) with the corrected background.
October 26, 2025 at 4:06 PM
Map of the internet: 1.3M nodes (BGP)
October 26, 2025 at 1:39 PM
526.9 million player deaths in 24.7 million levels of Super Mario Maker 2. Data by @tgr.bsky.social
September 28, 2025 at 3:54 PM
Really cool new embeddings exploration tool by @domoritz.de and colleagues from Apple. Can't wait to build with this. Also includes a streamlit component and a Jupyter widget.
July 11, 2025 at 2:17 PM
Woah! EA just open sourced "Command and Conquer: Red Alert" and a bunch of other CnC games! github.com/electronicar...
February 28, 2025 at 12:12 PM
This is also addressed in the appendix of @alisawuffles.bsky.social and colleagues' paper on BPE mixture inference. I think it might have been discovered by @soldaini.net if I'm not mistaken.

arxiv.org/abs/2407.16607
February 28, 2025 at 10:47 AM
The folks at Foursquare released a @hf.co dataset of 104.5 million places of interest and here's all of them plotted using datashader
December 8, 2024 at 1:34 PM
I recently used the @lichess.org puzzles dataset to experiment with chess position embeddings and visualize 4.5M starting positions. (hf.co/datasets/Lic...)
December 6, 2024 at 1:00 PM
Early experiment visualizing of Cohere For AI's newly-released Aya dataset. Multilingual corpora are always so fun to play with.
February 13, 2024 at 8:01 PM
Happy birthday Meg!
November 19, 2023 at 1:45 PM
I used the datashader library to plot 100M points that follow the pictured formula. The shading shows how "busy" with points a given location is.
November 18, 2023 at 10:09 AM
Clifford-inspired strange attractor.
November 17, 2023 at 7:38 PM
10 million digits of Pi.

Kind of.
September 27, 2023 at 7:40 PM
835 languages.
3.5 million bible verses.
Work in progress.
September 26, 2023 at 4:49 PM
UMAP connectivity graphs—with edgehammer bundling—are always something to gaze at.
September 26, 2023 at 9:10 AM
Revisiting John Williamson's prime factors plot with a few differences in implementation. I am using UMAP and Datashader to visualize the first million integers. Not quite there yet.
September 25, 2023 at 10:29 AM
Multilingual text corpus or Petri dish?
June 6, 2023 at 2:20 PM
Code Dataset Visualization—11.66 million files from the Stack, a dataset sourced from permissively-licensed GitHub repositories spanning 86 programming languages (StarCoder languages subset).
June 5, 2023 at 5:12 PM