Kyle McDonald
kcimc.bsky.social
Kyle McDonald
@kcimc.bsky.social
artist working with code kylemcdonald.net
sailing from lumolumo to dawson island
November 7, 2025 at 8:40 AM
water witches and solar power
November 7, 2025 at 8:01 AM
from the other side of the welcoming party
November 7, 2025 at 7:41 AM
betel nut, berries, apples and coconuts
October 20, 2025 at 3:21 AM
new cables and old ladders
October 19, 2025 at 1:39 AM
a short trip from alotau to lumolumo with an incredible welcoming
October 18, 2025 at 3:40 AM
i’m on day 19 of 50 days in the south pacific, helping upgrade power and internet for two traditional voyaging organizations—and trying to capture a rare flash of light called “te lapa”. i’m posting daily on instagram instagram.com/kcimc
October 6, 2025 at 12:30 AM
the latest realtime video-to-video demos are wild
July 29, 2025 at 4:00 AM
has anyone written about AI-assisted vibe coding? the loss of the flow state, the move away from mental modeling of computational processes, managerialization of the developer class, etc? it feels like a bellweather, but i’ve had trouble explaining to non-programmers.
February 22, 2025 at 12:17 AM
are there any citizen science efforts to figure out what is actually in the air and in the ash in LA right now?
January 14, 2025 at 7:54 PM
gemini 2 allows for some absolutely bounding box prediction 😳 i don't know any other way to quickly accomplish something like this. it only misses one object, and misattributes one title.
December 12, 2024 at 3:31 PM
on some of the pages that are upside-down, gemini sometimes transcribed text using upside down unicode characters (but with nonsense english). i'm super curious where this ability comes from—are there images annotated with upside-down unicode in the training data?
December 12, 2024 at 3:18 PM
the OCR capabilities are honestly out of control. i went through a phase in the early 2010s of playing with this very stylized script, and i'm so impressed in managed to correctly transcribe "light leaks". and it almost gets "the nine billion names of god".
December 12, 2024 at 3:18 PM
i thought it would be fun to have colored stickers representing the different categories of things that i'm thinking about. so i fed all the transcribed text to gemini and asked for some categories, and then asked it to tag each page with relevant categories.
December 12, 2024 at 3:18 PM
sometimes i've got to look for an example to see how it tagged the page. there are a bunch of pages i spilled water on two decades ago, and it has decided that these are "watercolors".. very cute
December 12, 2024 at 3:17 PM
the summarization feature is incredible because it means you can search for "recipe" even when the word "recipe" does not appear anywhere on the page
December 12, 2024 at 3:17 PM
gemini is completely different from traditional OCR here. OCR is faster, around 1s, while in my tests with gemini 1.5 pro i was seeing around 8s. and OCR will give you per-character bounding boxes! but it's almost useless as plain text—and blind to diagrams and drawings
December 12, 2024 at 3:17 PM
i've been using a prompt like this to not just do OCR, but also provide descriptions of diagrams, and generate keywords for searching that might not even appear in the text itself
December 12, 2024 at 3:16 PM
next step is converting to text. gemini actually has a big enough context window to just load hundreds of pages in at a time, but i'm really interested in caching analysis so i can have more of a real-time interaction.
December 12, 2024 at 3:16 PM
the next step is getting each photo split into two pages and oriented right side up. after the first few years of having a sketchbook, i started to alternate the orientation of every page so my hand never ran into the spine.
December 12, 2024 at 3:16 PM
it feels like a lot. it's almost my whole life. one of my highschool art teachers required us to keep a sketchbook, and i never stopped. it's where i keep my thoughts organized. a lot of my thinking is visual or diagrammatic.
December 12, 2024 at 3:15 PM
the first step could not be automated—i pulled out my box of 33 sketchbooks, and took a photo of every single page. around 4300 pages in total.
December 12, 2024 at 3:15 PM
i'm building an experimental tool for exploring 25 years of my old sketchbooks, with image and text recognition powered by gemini
December 12, 2024 at 3:14 PM
when i first tested this with chatgpt in march 2023, i sat down with an expert fijian navigator and he laughed at how wrong it was. it's like asking chatgpt "who painted the mona lisa?" and it answered "michelangelo". an answer that is not only wrong but also reveals the deeper associative structure
December 12, 2024 at 12:44 AM
one of my biggest concerns about LLMs is their ability to smother marginalized cultures with homogenizing hallucination. gemini 2.0 flash (released today) is one of the first models to explain the fijian wind compass clearly, succinctly, and accurately.
December 12, 2024 at 12:37 AM