James Futhey
banner
jamesfuthey.com
James Futhey
@jamesfuthey.com
🌈 Indie Hacker, Founder @Meetingroom365.com - Seattle / Taipei - jamesfuthey.com - Previously Analytics @Adobe, Design @HBO. @kidgdzilla on the legacy app. Building 🌟 transparent.se 🕹️ pmn.blue 🍍 indie.am/james
Ahhhhhh I should have bought this while I was in Taipei….. it’ll never come to the US.

I’m so stupid (Amy) rose.
June 20, 2025 at 11:15 PM
"Pornographic Images with Rivers"

This is what happens when you let an LLM label clusters based on alt tags, 0.001% of the time 🧐

www.transparent.se/image-cluste...

The alt tag is almost always "IEMBot Image TBD" so.... 🤡
May 31, 2025 at 2:54 PM
Ack, found my mistake from 9 days ago 🙈

rb_jaccard_dist is DISTANCE not SIMILARITY, so "most similar" would be least distant. And.... it looks like I flipped this to DESC by mistake.

I bet I'll find more mistakes if I keep going 😅
May 7, 2025 at 7:52 AM
Something has seemed a bit off for me too. Probably overdue to stir the linear algebra around and see what's happening.

xkcd.com/1838
May 7, 2025 at 4:49 AM
Clustering just fell into place. Only issues I had were pipeline stuff, none of the core assumptions were too far off, MiniBatchKMeans just... worked!

Took what I learned from autolabeling posts and applied that to alt tags. 4o-mini handled those.
May 4, 2025 at 2:18 PM
Everything went better than expected, basically successful on the first try.

MobileCLIP is WAY better than it deserves to be, given how performant it is.

Never got Zero shot out of it but the embeddings are very good.
May 4, 2025 at 2:18 PM
Every image posted to Bluesky in the last week, semantically clustered and labeled.

Jetstream -> Postgres -> Python (ML)

www.transparent.se/image-cluste...
May 4, 2025 at 2:18 PM
ahhhhhh I can't wait for image cluster labeling to finish, this is going to be such a cool little thing to share tomorrow.

I think it'll take 3 hours, I never optimized the script to do any of this in parallel, I just go to dinner and come back and it's done.
May 4, 2025 at 9:30 AM
Clustering pipeline on 6.6m Bluesky posts currently takes 1 hour 44 minutes on my Macbook Pro.

Good news is that it now actually works without any errors or bugs, so I can probably productionalize it and run it on a GPU next week instead.
May 4, 2025 at 5:56 AM
Pikmin update, 81 days to Seattle…

I don’t know why this is so funny

I think I get a sticker when he gets back 🤷
May 3, 2025 at 4:26 PM
May 2, 2025 at 2:52 PM
I wonder if bro’s gonna make it back before I do…
May 2, 2025 at 2:52 PM
Reminds me of the old loading screen:
May 1, 2025 at 3:42 PM
Pretty excited! At 3m posts clusters are stabilizing. You can see multiple runs and even active clusters overlap, even in 2d space.

So now they can be merged together, and more importantly, you can keep assigning posts to them right after calculating embeddings, without re-clustering.
April 26, 2025 at 2:40 AM
Copypasta explorer (duplicated posts on Bluesky)

See users who repeatedly post the same post text, or groups of users who post the same exact post text (1 week timeframe)

www.transparent.se/copypasta.html

#dataviz #bluesky
April 25, 2025 at 9:03 AM
Like, Wordle game results still haven't stabilized into a single cluster. And, there are a TON of Wordle game results being shared on Bluesky.
April 24, 2025 at 5:26 PM
Like, this cluster is literally just Guardian articles in every URL. That's the centroid.

AI label was "US News Commentary". Decent enough for the cluster explorer tool.

But, stick that in a list, someone clicks it, they aren't going to expect it will filter out shares of Guardian articles.
April 24, 2025 at 5:26 PM
Jetstream Video Post -> ffmpeg -> extract 1-5 frames -> CLIP -> zero-shot labels + embeddings -> mean embed -> whisper transcript -> db

Theoretically I can do something with these, but for today, I just figure I'm finishing the processing pipeline.

Will figure out a use-case later 🙃
April 24, 2025 at 5:26 PM
lol gamestop.com running out of sockets...

Not ordering just watching the carnage. Contributing to the fail whale that is the Switch 2 Launch...
April 24, 2025 at 3:09 PM
Ah yeah, the Android install prompt. When I tested it seemed pretty usable if they fixed SafeAreaProvider in react native for App.web.tsx, Then there would be a bit more work for notifications and the install prompt.

Safe areas are what absolutely break it for me 😅
April 23, 2025 at 10:59 AM
Moved embeddings/ml to vast.ai, got a bid at 2¢/hr ($16/mo) on a 3090, stupid cheap (not guaranteed, it can be outbid).

Can just push into a queue when it goes offline, I assume.

Main feed server is $43/mo Epyc 128gb for Postgres.

Clustering on my Macbook Pro until I can't.

ML can be cheap!
April 23, 2025 at 10:15 AM
Good point. I'm hoping it's just labeling in the sense that they keep some kind of UI like this:
April 22, 2025 at 5:23 PM
My first and only question was, please tell me you did this in console and not photoshop.

Thx for that.
April 22, 2025 at 5:13 PM
Threw up a Cluster Explorer for 🦋 posts
www.transparent.se/clusters.html

Not enough (40k) posts to say the clusters have stabilized yet, but you can view the centroids in x/y space, search, view random posts, etc.

It's kind of fun!

But, uhh, my embedding pipeline sucks. So it's only 40k posts 😅
April 22, 2025 at 4:39 PM
Ayyyyy... quantized embeddings are getting computed for every post in real time on cpu time (🤩) and I am still able to run clustering on my M1 Max 🙌

LLM-assisted labelling is doing it's thing after rewriting the prompt for 30 minutes.

Hoping to see clusters stabilize with <1 week of posts.
April 22, 2025 at 6:27 AM