michiel van der ree
mhvdr.nl
michiel van der ree
@mhvdr.nl
doing applied (ai ∪ ml ∪ nlp ∪ llms) to accelerate research @ university of groningen (nl)
With Dutch parliamentary elections next week, explore this interactive AI-powered analysis of every vote since December 2023. See parties’ stances by topic, impact and beneficiaries (NB in Dutch).
datascience.web.rug.nl/parliamentar...
Parliamentary votes
datascience.web.rug.nl
October 24, 2025 at 9:49 AM
Reposted by michiel van der ree
The model is the product.

New blog post on what the latest research trends mean for the next commercial cycle: specialized models behaving like an integrated systems, model providers moving up to application layer, training or being trained on. vintagedata.org/blog/posts/m...
The Model is the Product | Vintage Data
Old data, new models
vintagedata.org
March 2, 2025 at 1:57 PM
Reposted by michiel van der ree
I'll get straight to the point.

We trained 2 new models. Like BERT, but modern. ModernBERT.

Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.

It's much faster, more accurate, longer context, and more useful. 🧵
December 19, 2024 at 4:45 PM
Reposted by michiel van der ree
“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.
December 5, 2024 at 4:39 PM
Reposted by michiel van der ree
🪼 Welcome JUST-OS!

JUST-OS is an exciting initiative by researchers at the University of Groningen and FORRT (@forrt.bsky.social; forrt.org). We’re developing an AI-based chatbot to simplify navigating Open Science resources.
November 21, 2024 at 3:40 PM
Reposted by michiel van der ree
Releasing two trillion tokens in the open. huggingface.co/blog/Pclangl...
November 13, 2024 at 5:59 PM
Reposted by michiel van der ree
Introducing Early American HistoriChat, a chatbot trained on the EvansTCP corpus, ~5,000 American printed texts from 1640 to 1800: eahc.mhvdr.nl

Inspired by the work of @dorialexander.bsky.social; designed & built by @michielree.bsky.social in collaboration with MLT & the H-GEAR project
October 30, 2024 at 1:36 PM
liking 10 @vickiboykis.com posts to bootstrap this bsky algorithm
October 26, 2024 at 3:48 PM