Mathias Nielsen
mathiasesn1.bsky.social
Mathias Nielsen
@mathiasesn1.bsky.social
Reposted by Mathias Nielsen
we heard you hate writing boilerplate code

so we built something...

> open gradio sketch
> select and add components
> configure visually
> get perfect python code 🤯

Building AI apps will never be the same 🔥
Coming very soon 👀
February 19, 2025 at 10:12 AM
Reposted by Mathias Nielsen
[dk] Ok, nu begynder det at blive dumt, det her..
Scientists at the CDC were ordered late on Friday to withdraw any pending publications, at any scientific journal, that mention terms such as “transgender,” “immigrant,” “LGBT” or “pregnant people.”
CDC Scientists Ordered to Withdraw Studies That Say ‘LGBT’ or ‘Pregnant People’
www.nytimes.com
February 2, 2025 at 4:58 PM
Reposted by Mathias Nielsen
We’re building a new static type checker for Python, from scratch, in Rust.

From a technical perspective, it’s probably our most ambitious project yet. We’re about 800 PRs deep!
January 29, 2025 at 5:18 PM
Reposted by Mathias Nielsen
🤯 Vector search on top of millions of docs in seconds. no pre-indexing!

Model2Vec is an embedding powerhouse that distils good models and makes them up by 500x faster and 15x smaller.

Vector Search on Hub Datasets demo: https://buff.ly/4gYhVlY
Library: https://buff.ly/42miwte
Vectorsearch Hub Datasets - a Hugging Face Space by davidberenstein1957
Add vectors to Hub datasets and do in memory vector search.
huggingface.co
January 24, 2025 at 1:00 PM
Reposted by Mathias Nielsen
Introducing Scandi-fine-web-cleaner, a decoder model trained to remove low-quality web from FineWeb 2 for Danish and Swedish

- Uses FineWeb-c community annotations
- 90%+ precision + minimal compute required
- Enables efficient filtering of 43M+ documents

huggingface.co/davanstrien/...
January 13, 2025 at 3:48 PM
Reposted by Mathias Nielsen
This is a particularly bad case-study in how badly AI summarization can go when its exposed to the wilds of the internet - posted some notes on my blog: simonwillison.net/2024/Dec/29/...
December 29, 2024 at 1:32 AM
Reposted by Mathias Nielsen
💥 Ending 2024: A full data annotation journey on the Hugging Face Hub—from raw data to training-ready datasets!

With Argilla 2.6.0, push your data to the Hub from the UI

Let’s make 2025 the year anyone can build more transparent and accountable AI—no coding or model skills needed.
December 20, 2024 at 11:14 AM
Reposted by Mathias Nielsen
The Phi-4 Technical Report briefly mentions the importance of the sequence in which the training data is fed to the model. Actually I think that determining the ideal sequence should be the next big research topic.

huggingface.co/papers/2412....
Paper page - Phi-4 Technical Report
Join the discussion on this paper page
huggingface.co
December 14, 2024 at 6:32 PM
Reposted by Mathias Nielsen
Angreb på Ultralytics via GitHub Actions og PyPI: blog.pypi.org/posts/2024-1... #dkdev
Supply-chain attack analysis: Ultralytics - The Python Package Index Blog
Analysis of a package targeted by a supply-chain attack to the build and release process
blog.pypi.org
December 14, 2024 at 4:25 PM
Reposted by Mathias Nielsen
December 9, 2024 at 4:28 AM
Reposted by Mathias Nielsen
Remember Molmo? The full recipe is finally out!

Training code, data, and everything you need to reproduce our models. Oh, and we have updated our tech report too!

Links in thread 👇
December 9, 2024 at 6:34 PM
Reposted by Mathias Nielsen
🏮 New YouTube video! 🏮

We experimented with a log monitoring system. Spin up the agent and it'll monitor your logs for any potential issues -- it works with webserver logs like nginx or Apache, Linux system logs, etc.

youtu.be/csw6TVfzBcw
Creating a Structured AI Log Analysis System with Python & LLMs
YouTube video by dottxt
youtu.be
December 5, 2024 at 5:08 PM
Reposted by Mathias Nielsen
Look at this! 🤩

@pydantic.bsky.social for AI Agents 🤖🚀
PydanticAI is here!

An Agent Framework designed for production, from the team who created and maintain @pydantic.bsky.social.

As some of you will know, I've been working on this for some time, can't wait to see what people build with it.

ai.pydantic.dev
Introduction
Agent Framework / shim to use Pydantic with LLMs
ai.pydantic.dev
December 2, 2024 at 11:33 AM
Reposted by Mathias Nielsen
FYI, here's the entire code to create a dataset of every single bsky message in real time:

```
from atproto import *
def f(m): print(m.header, parse_subscribe_repos_message())
FirehoseSubscribeReposClient().start(f)
```
November 28, 2024 at 9:56 AM
Reposted by Mathias Nielsen
The thing is, there's already a dataset of 235 MILLION posts from 4 MILLION users available for months. Not sure why @hf.co is a target of abuse

zenodo.org/records/1108...
November 28, 2024 at 1:32 AM
Reposted by Mathias Nielsen
*** New Model on ScandEval ***

New Danish LLM from the NLP North Lab, SnakModel, based on Llama-2-7b.

Danish results (lower is better):
- NLPnorth/snakmodel-7b-base: 3.60
- NLPnorth/snakmodel-7b-instruct: 2.59

For reference, Llama-2-7b achieves 3.08.

Leaderboards: scandeval.com

#dkai #nlp
NLPnorth/snakmodel-7b-instruct · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 26, 2024 at 12:21 PM
📱 Jeg har lavet et feed, der samler dansk tech-indhold via hashtaggene #dkai, #dkdev og #dktech! Følg med for at holde dig opdateret med det danske tech-community 🇩🇰

Prøv det her: bsky.app/profile/did:...
November 23, 2024 at 2:53 PM