Mat Miller
@matdmiller.bsky.social
AI, Engineer, Developer, Traveler, Technologist, @fastdotai Fellow
Reposted by Mat Miller
Made a tiny tool to help turn a Bluesky handle into a DID: tools.simonwillison.net/bluesky-reso...
January 6, 2025 at 6:00 PM
Made a tiny tool to help turn a Bluesky handle into a DID: tools.simonwillison.net/bluesky-reso...
Reposted by Mat Miller
I like Anthropic's "contextual retrieval" suggestion for creating chunks simonwillison.net/2024/Sep/20/...
Introducing Contextual Retrieval
Here's an interesting new embedding/RAG technique, described by Anthropic but it should work for any embedding model against any other LLM. One of the big challenges in implementing semantic search …
simonwillison.net
January 4, 2025 at 6:52 PM
I like Anthropic's "contextual retrieval" suggestion for creating chunks simonwillison.net/2024/Sep/20/...
Reposted by Mat Miller
Looks like @graze.social exists already!
Their programming language is a nested JSON object of filters and Boolean operations: www.graze.social/docs/editor-...
bsky.app/profile/graz...
Their programming language is a nested JSON object of filters and Boolean operations: www.graze.social/docs/editor-...
bsky.app/profile/graz...
January 5, 2025 at 5:30 PM
Looks like @graze.social exists already!
Their programming language is a nested JSON object of filters and Boolean operations: www.graze.social/docs/editor-...
bsky.app/profile/graz...
Their programming language is a nested JSON object of filters and Boolean operations: www.graze.social/docs/editor-...
bsky.app/profile/graz...
Reposted by Mat Miller
🔥 Host your apps for FREE and simplify your infrastructure! 🚀
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
January 1, 2025 at 8:02 PM
🔥 Host your apps for FREE and simplify your infrastructure! 🚀
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
🔥 Host your apps for FREE and simplify your infrastructure! 🚀
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
January 1, 2025 at 8:02 PM
🔥 Host your apps for FREE and simplify your infrastructure! 🚀
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
Tired of complex cloud hosting and the recurring bills that come with it? Host locally on hardware you already own and access anywhere with a browser. It's simpler than you think!
Learn how in my latest post. 🔗👇
Reposted by Mat Miller
This week we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.
December 22, 2024 at 6:12 AM
This week we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.
Reposted by Mat Miller
I'll get straight to the point.
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
December 19, 2024 at 4:45 PM
I'll get straight to the point.
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
We trained 2 new models. Like BERT, but modern. ModernBERT.
Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.
It's much faster, more accurate, longer context, and more useful. 🧵
Reposted by Mat Miller
I figured out a prompting pattern for getting Claude to produce fully self-contained Python scripts that execute with "uv run" using PEP 723 inline script dependencies - and now I can one-shot useful Python utilities with it simonwillison.net/2024/Dec/19/...
Building Python tools with a one-shot prompt using uv run and Claude Projects
I’ve written a lot about how I’ve been using Claude to build one-shot HTML+JavaScript applications via Claude Artifacts. I recently started using a similar pattern to create one-shot Python utilities,...
simonwillison.net
December 19, 2024 at 1:13 PM
I figured out a prompting pattern for getting Claude to produce fully self-contained Python scripts that execute with "uv run" using PEP 723 inline script dependencies - and now I can one-shot useful Python utilities with it simonwillison.net/2024/Dec/19/...
Reposted by Mat Miller
HOT 🔥 fastest, most precise, and most capable hand control setup ever...
Less than $450 and fully open-source 🤯
by @huggingface, @therobotstudio, @NepYope
This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization 🚀
A thread 🧵
Less than $450 and fully open-source 🤯
by @huggingface, @therobotstudio, @NepYope
This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization 🚀
A thread 🧵
December 15, 2024 at 8:22 AM
HOT 🔥 fastest, most precise, and most capable hand control setup ever...
Less than $450 and fully open-source 🤯
by @huggingface, @therobotstudio, @NepYope
This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization 🚀
A thread 🧵
Less than $450 and fully open-source 🤯
by @huggingface, @therobotstudio, @NepYope
This tendon-driven technology will disrupt robotics! Retweet to accelerate its democratization 🚀
A thread 🧵
Hey @runme.dev - love your notebook-style script execution! Any plans for Windows native support? Many of us in corporate environments need this without the WSL dependency. As a Jupyter notebooks fan, I can immediately see the value of this but a many of my use cases today would be Windows native.
December 14, 2024 at 3:06 PM
Hey @runme.dev - love your notebook-style script execution! Any plans for Windows native support? Many of us in corporate environments need this without the WSL dependency. As a Jupyter notebooks fan, I can immediately see the value of this but a many of my use cases today would be Windows native.
Reposted by Mat Miller
Excited to share that we have just released genomenotebook 1.0 together with @trichomedoctor.bsky.social ! If you like to analyze genomics data in Jupyter notebooks this is for you. You can get beautiful genome browsers directly in your notebooks: dbikard.github.io/genomenotebo...
genomenotebook
A genome browser in your Jupyter notebook
dbikard.github.io
December 13, 2024 at 4:48 PM
Excited to share that we have just released genomenotebook 1.0 together with @trichomedoctor.bsky.social ! If you like to analyze genomics data in Jupyter notebooks this is for you. You can get beautiful genome browsers directly in your notebooks: dbikard.github.io/genomenotebo...
Reposted by Mat Miller
Non-commercial license, as usual from MSR. 🙄
And of course they don't even give you the option to pay for it.
And of course they don't even give you the option to pay for it.
December 13, 2024 at 6:32 AM
Non-commercial license, as usual from MSR. 🙄
And of course they don't even give you the option to pay for it.
And of course they don't even give you the option to pay for it.
Reposted by Mat Miller
Chat with any open source repo easily. Gitingest (free online tool) turns any GitHub repository into a single markdown file for pasting. Claude artifacts makes this 300k token output pretty easy to work with.
December 8, 2024 at 2:04 AM
Chat with any open source repo easily. Gitingest (free online tool) turns any GitHub repository into a single markdown file for pasting. Claude artifacts makes this 300k token output pretty easy to work with.
Reposted by Mat Miller
This is a neat new variant on RAG - no vectors, not even full-text search, instead showing the model a header hierarchy and giving it a tool to read the relevant sections
My notes here: simonwillison.net/2024/Dec/6/r...
My notes here: simonwillison.net/2024/Dec/6/r...
December 6, 2024 at 3:04 AM
This is a neat new variant on RAG - no vectors, not even full-text search, instead showing the model a header hierarchy and giving it a tool to read the relevant sections
My notes here: simonwillison.net/2024/Dec/6/r...
My notes here: simonwillison.net/2024/Dec/6/r...
Reposted by Mat Miller
As R&D staff @ answer.ai, I work a lot on boosting productivity with AI. A common theme that always comes up is the combination of human+AI. This combination proved to be powerful in our new project ShellSage, which is an AI terminal buddy that learns and teaches with you. A 🧵
December 5, 2024 at 8:27 PM
As R&D staff @ answer.ai, I work a lot on boosting productivity with AI. A common theme that always comes up is the combination of human+AI. This combination proved to be powerful in our new project ShellSage, which is an AI terminal buddy that learns and teaches with you. A 🧵
Reposted by Mat Miller
I can't begin to describe how life-changing this new project, ShellSage, has been for me over the last few weeks.
ShellSage is an LLM that lives in your terminal. It can see what directory you're in, what commands you've typed, what output you got, & your previous AI Q&A's.🧵
ShellSage is an LLM that lives in your terminal. It can see what directory you're in, what commands you've typed, what output you got, & your previous AI Q&A's.🧵
As R&D staff @ answer.ai, I work a lot on boosting productivity with AI. A common theme that always comes up is the combination of human+AI. This combination proved to be powerful in our new project ShellSage, which is an AI terminal buddy that learns and teaches with you. A 🧵
December 5, 2024 at 8:30 PM
I can't begin to describe how life-changing this new project, ShellSage, has been for me over the last few weeks.
ShellSage is an LLM that lives in your terminal. It can see what directory you're in, what commands you've typed, what output you got, & your previous AI Q&A's.🧵
ShellSage is an LLM that lives in your terminal. It can see what directory you're in, what commands you've typed, what output you got, & your previous AI Q&A's.🧵
Reposted by Mat Miller
For anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol.
🧵>>
🧵>>
December 3, 2024 at 9:21 AM
For anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol.
🧵>>
🧵>>
Reposted by Mat Miller
Amazing demo by Vaibhav Srivastav of structured data extraction running on an LLM that executes entirely in the browser (Chrome only for the moment since it uses WebGPU). https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/
Structured Generation w/ SmolLM2 running in browser & WebGPU
Extraordinary demo by Vaibhav Srivastav. Here's Hugging Face's [SmolLM2-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) running directly in a web browser (using WebGPU, so requires Chrome [for the moment](https://github.com/gpuweb/gpuweb/wiki/Implementation-Status)) demonstrating structured text extraction, converting a text …
simonwillison.net
November 29, 2024 at 9:14 PM
Amazing demo by Vaibhav Srivastav of structured data extraction running on an LLM that executes entirely in the browser (Chrome only for the moment since it uses WebGPU). https://simonwillison.net/2024/Nov/29/structured-generation-smollm2-webgpu/
Reposted by Mat Miller
After seeing all those toxic replies to @danielvanstrien.bsky.social , I created a small script to auto-block people. Let's use AI to moderate the replies.
gist.github.com/tcapelle/00a...
cc. @eugeneyan.bsky.social
gist.github.com/tcapelle/00a...
cc. @eugeneyan.bsky.social
November 29, 2024 at 11:51 AM
After seeing all those toxic replies to @danielvanstrien.bsky.social , I created a small script to auto-block people. Let's use AI to moderate the replies.
gist.github.com/tcapelle/00a...
cc. @eugeneyan.bsky.social
gist.github.com/tcapelle/00a...
cc. @eugeneyan.bsky.social
Reposted by Mat Miller
Reposted by Mat Miller
A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
November 28, 2024 at 6:57 PM
A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.
Reposted by Mat Miller
I find strange that engineering the gradient radically to learn better is not a whole field.
November 29, 2024 at 3:59 AM
I find strange that engineering the gradient radically to learn better is not a whole field.
Reposted by Mat Miller
TIL how to implement GitHub OAuth for an otherwise completely static website (HTML+JavaScript on GitHub Pages) using Cloudflare Workers: https://til.simonwillison.net/cloudflare/workers-github-oauth
November 29, 2024 at 1:27 AM
TIL how to implement GitHub OAuth for an otherwise completely static website (HTML+JavaScript on GitHub Pages) using Cloudflare Workers: https://til.simonwillison.net/cloudflare/workers-github-oauth
Reposted by Mat Miller
✨ More Bluesky tools for power-users, curious people, and nerds. ✨
One thing that makes this place a better option for anyone wanting a more conscious social experience is its "opensource-ness."
It gives us a level of freedom we haven't had since the MySpace era. I missed that.
🧵 1/10
One thing that makes this place a better option for anyone wanting a more conscious social experience is its "opensource-ness."
It gives us a level of freedom we haven't had since the MySpace era. I missed that.
🧵 1/10
November 28, 2024 at 6:19 AM
✨ More Bluesky tools for power-users, curious people, and nerds. ✨
One thing that makes this place a better option for anyone wanting a more conscious social experience is its "opensource-ness."
It gives us a level of freedom we haven't had since the MySpace era. I missed that.
🧵 1/10
One thing that makes this place a better option for anyone wanting a more conscious social experience is its "opensource-ness."
It gives us a level of freedom we haven't had since the MySpace era. I missed that.
🧵 1/10
Reposted by Mat Miller
Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
November 28, 2024 at 6:12 PM
Did you know that 99% of email today is spam? Your inbox isn’t 99% spam because AI is used to filter it.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.