Lightnews — Scholar-powered news

Ian Johnson 🔬🤖

@enjalot.bsky.social

alternative view

March 18, 2025 at 1:59 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

mean pooling

March 18, 2025 at 1:59 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

implemented a new rendering component for latent scope's scatter plot. had to replace regl-scatterplot with d3-zoom + regl shaders so we could support mobile

January 23, 2025 at 12:37 AM

Ian Johnson 🔬🤖

@enjalot.bsky.social

am i missing something for handling image data in parquet files?

I can load a dataset from HF like:
dataset = load_dataset("Marqo/marqo-ge-sample", split='google_shopping')
df = pd.DataFrame(dataset)
but i need to convert the images to bytes if I want to do:
df.to_parquet("sample.parquet")

December 10, 2024 at 7:51 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

the algorithm is not some deity but a landscape, the feed is an uber ride across the manifold, only the windows are blacked out. what if you had a map of the algorithm? what if the UX of the feed let you look out of the window?

musing with @infowetrust.com
image from distill.pub/2017/aia/

December 5, 2024 at 1:33 AM

Ian Johnson 🔬🤖

@enjalot.bsky.social

I've organized and participated in many unconferences in the past, and they are always the most intense exchange of ideas and information that I've experienced. Given the energy we're seeing in the registration this one is poised to be no different!

register today!
hiddenstates.org

November 26, 2024 at 6:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

Hidden States is happening next week in SF!

It's a one-day unconference gathering researchers, designers, prototypers and engineers interested in pushing the boundaries of AI interfaces, going below the API and working with the hidden states.

hiddenstates.org

November 26, 2024 at 6:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

If you do this with enough data you start to get a map of the patterns found in your dataset.

When you embed new data, like the question for a RAG query, you can see where on the map it lands.

November 21, 2024 at 7:29 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

You can map more and more points, a less similar point will show up a little further away.
As you add more points a map starts to form, with clusters of similar data spread out before you

November 21, 2024 at 7:29 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

We have another tool we can use to make sense of the patterns found in our embeddings.

We can use UMAP to place similar embeddings close together in 2D space. So two passages that have similar high-dimensional representations will show up close together in 2D

November 21, 2024 at 7:29 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

This is the basis of the retrieval in RAG. We embed a question or prompt, and we find the dataset representations that are most similar to the question.

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

We may not be able to make sense of those patterns directly by looking at the representations, but we do have some tools to help us

The first tool that is most familiar is cosine similarity. It allows us to see how similar two high-dimensional vectors are.

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

So now we can shine a piece of data through the lens and get back a representation (embedding)

This representation has some special properties, namely that it "represents" the patterns the model has found in the data.

We can get a representation for any data (that our model can handle)

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

I like to represent those vectors as these little grids. Since as humans we can't make much sense of what the numbers mean in this form we might as well make them a little easier to look at.

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

If we think of the model as a lens, you shine your chunk through and get a representation (a long list of numbers).

That list of numbers is an embedding AKA latent vector. For a given model its always the same length (dimensionality)

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

then you take your model and encode each chunk

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

The hidden states are often known as embeddings, they are the main output you get from BERT models and Sentence Transformers. They are what powers the "R" in RAG

The basic idea is you take your text and chunk it into pieces

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

One way I've been thinking about ML models for some time is as a lens.

The weights are crystalized patterns whose structure emerges from the crushing pressures of backpropagation.

By shining a piece of data through this lens you see the patterns diffracted in the hidden states.

November 21, 2024 at 7:23 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

UMAP of the counties of course 😁
pretty geographically correlated (counties from the same state end up in similar clusters, or share clusters with nearby states)

November 19, 2024 at 9:13 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

oops, transparent PNG doesn't work well on here eh?

November 15, 2024 at 5:01 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

Now I'm building Latent Interfaces, an applied research lab for advanced data visualization. It pulls together all of my past interests and skills, from microscopes to making maps, exploiting linear algebra to making data understandable.

read more details here: enjalot.substack.com

November 15, 2024 at 4:56 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

Way back in 2003 I was making java applets like:
micro.magnet.fsu.edu/primer/java/...

Even before that I was behind the microscope after school getting paid in computer parts for taking pictures of pond life (my first NVIDIA GPU in 2001!)

November 15, 2024 at 4:56 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

I went to undergrad for Applied Math (and Mandarin Chinese) then got a Master's in Scientific Computing (basically linear algebra + distributed/parallel computing) graduating in 2011 right before the latest deep learning explosion.

Throughout school I worked part-time making GIS web maps.

November 15, 2024 at 4:56 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

from 2012-2022 I co-organized the d3.js meetup in the SF Bay Area, working at various startups until joining Google from 2016-2020 and Observable from 2020 -2022

I've written a good bit about data visualization with d3.js and building community here:
medium.com/@enjalot

November 15, 2024 at 4:56 PM

Ian Johnson 🔬🤖

@enjalot.bsky.social

allow me to reintroduce myself!
I'm a prototyper and Data Alchemist interested in using machine learning for data visualization.

I'm building github.com/enjalot/late... using the lessons learned from co-authoring these 4 distill.pub papers

November 15, 2024 at 4:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news