I’m so stupid (Amy) rose.
I’m so stupid (Amy) rose.
This is what happens when you let an LLM label clusters based on alt tags, 0.001% of the time 🧐
www.transparent.se/image-cluste...
The alt tag is almost always "IEMBot Image TBD" so.... 🤡
This is what happens when you let an LLM label clusters based on alt tags, 0.001% of the time 🧐
www.transparent.se/image-cluste...
The alt tag is almost always "IEMBot Image TBD" so.... 🤡
rb_jaccard_dist is DISTANCE not SIMILARITY, so "most similar" would be least distant. And.... it looks like I flipped this to DESC by mistake.
I bet I'll find more mistakes if I keep going 😅
rb_jaccard_dist is DISTANCE not SIMILARITY, so "most similar" would be least distant. And.... it looks like I flipped this to DESC by mistake.
I bet I'll find more mistakes if I keep going 😅
xkcd.com/1838
xkcd.com/1838
Took what I learned from autolabeling posts and applied that to alt tags. 4o-mini handled those.
Took what I learned from autolabeling posts and applied that to alt tags. 4o-mini handled those.
MobileCLIP is WAY better than it deserves to be, given how performant it is.
Never got Zero shot out of it but the embeddings are very good.
MobileCLIP is WAY better than it deserves to be, given how performant it is.
Never got Zero shot out of it but the embeddings are very good.
Jetstream -> Postgres -> Python (ML)
www.transparent.se/image-cluste...
Jetstream -> Postgres -> Python (ML)
www.transparent.se/image-cluste...
I think it'll take 3 hours, I never optimized the script to do any of this in parallel, I just go to dinner and come back and it's done.
I think it'll take 3 hours, I never optimized the script to do any of this in parallel, I just go to dinner and come back and it's done.
Good news is that it now actually works without any errors or bugs, so I can probably productionalize it and run it on a GPU next week instead.
Good news is that it now actually works without any errors or bugs, so I can probably productionalize it and run it on a GPU next week instead.
I don’t know why this is so funny
I think I get a sticker when he gets back 🤷
I don’t know why this is so funny
I think I get a sticker when he gets back 🤷
So now they can be merged together, and more importantly, you can keep assigning posts to them right after calculating embeddings, without re-clustering.
So now they can be merged together, and more importantly, you can keep assigning posts to them right after calculating embeddings, without re-clustering.
See users who repeatedly post the same post text, or groups of users who post the same exact post text (1 week timeframe)
www.transparent.se/copypasta.html
#dataviz #bluesky
See users who repeatedly post the same post text, or groups of users who post the same exact post text (1 week timeframe)
www.transparent.se/copypasta.html
#dataviz #bluesky
AI label was "US News Commentary". Decent enough for the cluster explorer tool.
But, stick that in a list, someone clicks it, they aren't going to expect it will filter out shares of Guardian articles.
AI label was "US News Commentary". Decent enough for the cluster explorer tool.
But, stick that in a list, someone clicks it, they aren't going to expect it will filter out shares of Guardian articles.
Theoretically I can do something with these, but for today, I just figure I'm finishing the processing pipeline.
Will figure out a use-case later 🙃
Theoretically I can do something with these, but for today, I just figure I'm finishing the processing pipeline.
Will figure out a use-case later 🙃
Not ordering just watching the carnage. Contributing to the fail whale that is the Switch 2 Launch...
Not ordering just watching the carnage. Contributing to the fail whale that is the Switch 2 Launch...
Safe areas are what absolutely break it for me 😅
Safe areas are what absolutely break it for me 😅
Can just push into a queue when it goes offline, I assume.
Main feed server is $43/mo Epyc 128gb for Postgres.
Clustering on my Macbook Pro until I can't.
ML can be cheap!
Can just push into a queue when it goes offline, I assume.
Main feed server is $43/mo Epyc 128gb for Postgres.
Clustering on my Macbook Pro until I can't.
ML can be cheap!
Thx for that.
Thx for that.
www.transparent.se/clusters.html
Not enough (40k) posts to say the clusters have stabilized yet, but you can view the centroids in x/y space, search, view random posts, etc.
It's kind of fun!
But, uhh, my embedding pipeline sucks. So it's only 40k posts 😅
www.transparent.se/clusters.html
Not enough (40k) posts to say the clusters have stabilized yet, but you can view the centroids in x/y space, search, view random posts, etc.
It's kind of fun!
But, uhh, my embedding pipeline sucks. So it's only 40k posts 😅
LLM-assisted labelling is doing it's thing after rewriting the prompt for 30 minutes.
Hoping to see clusters stabilize with <1 week of posts.
LLM-assisted labelling is doing it's thing after rewriting the prompt for 30 minutes.
Hoping to see clusters stabilize with <1 week of posts.